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Preface for Digital Signal Processing: A User's Guide 


Digital signal processing (DSP) has matured in the past few decades from 
an obscure research discipline to a large body of practical methods with 
very broad application. Both practicing engineers and students specializing 
in signal processing need a clear exposition of the ideas and methods 
comprising the core signal processing "toolkit" so widely used today. 


This text reflects my belief that the skilled practitioner must understand the 
key ideas underlying the algorithms to select, apply, debug, extend, and 
innovate most effectively; only with real insight can the engineer make 
novel use of these methods in the seemingly infinite range of new problems 
and applications. It also reflects my belief that the needs of the typical 
student and the practicing engineer have converged in recent years; as the 
discipline of signal processing has matured, these core topics have become 
less a subject of active research and more a Set of tools applied in the course 
of other research. The modern student thus has less need for exhaustive 
coverage of the research literature and detailed derivations and proofs as 
preparation for their own research on these topics, but greater need for 
intuition and practical guidance in their most effective use. The majority of 
students eventually become practicing engineers themselves and benefit 
from the best preparation for their future careers. 


This text both explains the principles of classical signal processing methods 
and describes how they are used in engineering practice. It is thus much 
more than a recipe book; it describes the ideas behind the algorithms, gives 
analyses when they enhance that understanding, and includes derivations 
that the practitioner may need to extend when applying these methods to 
new situations. Analyses or derivations that are only of research interest or 
that do not increase intuitive understanding are left to the references. It is 
also much more than a theory book; it contains more description of 
common applications, discussion of actual implementation issues, 
comments on what really works in the real world, and practical "know- 
how" than found in the typical academic textbook. The choice of material 
emphasizes those methods that have found widespread practical use; 
techniques that have been the subject of intense research but which are 


rarely used in practice (for example, RLS adaptive filter algorithms) often 
receive only limited coverage. 


The text assumes a familiarity with basic signal processing concepts such as 
ideal sampling theory, continuous and discrete Fourier transforms, 
convolution and filtering. It evolved from a set of notes for a second signal 
processing course, ECE 451: Digital Signal Processing II, in Electrical and 
Computer Engineering at the University of Illinois at Urbana-Champaign, 
aimed at second-semester seniors or first-semester graduate students in 
signal processing. Over the years, it has been enhanced substantially to 
include descriptions of common applications, sometimes hard-won 
knowledge about what actually works and what doesn't, useful tricks, 
important extensions known to experienced engineers but rarely discussed 
in academic texts, and other relevant "know-how" to aid the real-world user. 
This is necessarily an ongoing process, and I continue to expand and refine 
this component as my own practical knowledge and experience grows. The 
topics are the core signal processing methods that are used in the majority 
of signal processing applications; discrete Fourier analysis and FFTs, digital 
filter design, adaptive filtering, multirate signal processing, and efficient 
algorithm implementation and finite-precision issues. While many of these 
topics are covered at an introductory level in a first course, this text aspires 
to cover all of the methods, both basic and advanced, in these areas which 
see widespread use in practice. I have also attempted to make the individual 
modules and sections somewhat self-sufficient, so that those who seek 
specific information on a single topic can quickly find what they need. 
Hopefully these aspirations will eventually be achieved; in the meantime, I 
welcome your comments, corrections, and feedback so that I can continue 
to improve this text. 


As of August 2006, the majority of modules are unedited transcriptions of 
handwritten notes and may contain typographical errors and insufficient 
descriptive text for documents unaccompanied by an oral lecture; I hope to 
have all of the modules in at least presentable shape by the end of the year. 


Publication of this text in Connexions would have been impossible without 
the help of many people. A huge thanks to the various permanent and 
temporary staff at Connexions is due, in particular to those who converted 


the text and equations from my original handwritten notes into CNXML 
and MathML. My former and current faculty colleagues at the University of 
Illinois who have taught the second DSP course over the years have had a 
substantial influence on the evolution of the content, as have the students 
who have inspired this work and given me feedback. I am very grateful to 
my teachers, mentors, colleagues, collaborators, and fellow engineers who 
have taught me the art and practice of signal processing; this work is 
dedicated to you. 


Discrete-Time Signals and Systems 
(Blank Abstract) 


Mathematically, analog signals are functions having as their independent 
variables continuous quantities, such as space and time. Discrete-time 
signals are functions defined on the integers; they are sequences. As with 
analog signals, we seek ways of decomposing discrete-time signals into 
simpler components. Because this approach leads to a better understanding 
of signal structure, we can exploit that structure to represent information 
(create ways of representing information with signals) and to extract 
information (retrieve the information thus represented). For symbolic- 
valued signals, the approach is different: We develop a common 
representation of all symbolic-valued signals so that we can embody the 
information they contain in a unified way. From an information 
representation perspective, the most important issue becomes, for both real- 
valued and symbolic-valued signals, efficiency: what is the most 
parsimonious and compact way to represent information so that it can be 
extracted later. 


Real- and Complex-valued Signals 


A discrete-time signal is represented symbolically as s(n), where 
n= On (Pal pees 
Cosine 


$n 
1 


The discrete- 
time cosine 
signal is 
plotted as a 
stem plot. Can 
you find the 
formula for 
this signal? 


We usually draw discrete-time signals as stem plots to emphasize the fact 
they are functions defined only on the integers. We can delay a discrete- 
time signal by an integer just as with analog ones. A signal delayed by m 
samples has the expression s(n — m). 


Complex Exponentials 


The most important signal is, of course, the complex exponential 
sequence. 
Equation: 


Note that the frequency variable f is dimensionless and that adding an 
integer to the frequency of the discrete-time complex exponential has no 
effect on the signal's value. 

Equation: 


i2n(f+tm)n _ i2nfn ,i2mmn 
e = e e 
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This derivation follows because the complex exponential evaluated at an 
integer multiple of 27r equals one. Thus, we need only consider frequency 
to have a value in some unit-length interval. 


Sinusoids 


Discrete-time sinusoids have the obvious form s(n) = A cos(2r fn + p). 
As opposed to analog complex exponentials and sinusoids that can have 
their frequencies be any real value, frequencies of their discrete-time 
counterparts yield unique waveforms only when f lies in the interval 


(— + 5 | . This choice of frequency interval is arbitrary; we can also choose 
the frequency to lie in the interval [0, 1). How to choose a unit-length 


interval for a sinusoid's frequency will become evident later. 


Unit Sample 


The second-most important discrete-time signal is the unit sample, which 
is defined to be 
Equation: 
1 if n=0 
a(n) = | 


0 otherwise 


Unit sample 
òn 
1 


The unit 
sample. 


Examination of a discrete-time signal's plot, like that of the cosine signal 
shown in [link], reveals that all signals consist of a sequence of delayed and 
scaled unit samples. Because the value of a sequence at each integer m is 
denoted by s(m) and the unit sample delayed to occur at m is written 

6(m — m), we can decompose any signal as a sum of unit samples delayed 
to the appropriate location and scaled by the signal value. 

Equation: 


This kind of decomposition is unique to discrete-time signals, and will 
prove useful subsequently. 


Unit Step 


The unit step in discrete-time is well-defined at the origin, as opposed to 
the situation with analog signals. 


Equation: 
je 1ifn>0 
i 0 if n<0 


Symbolic Signals 


An interesting aspect of discrete-time signals is that their values do not need 
to be real numbers. We do have real-valued discrete-time signals like the 
sinusoid, but we also have signals that denote the sequence of characters 
typed on the keyboard. Such characters certainly aren't real numbers, and as 
a collection of possible signal values, they have little mathematical structure 
other than that they are members of a set. More formally, each element of 
the symbolic-valued signal s(n) takes on one of the values {a1,...,ax} 
which comprise the alphabet A. This technical terminology does not mean 
we restrict symbols to being members of the English or Greek alphabet. 
They could represent keyboard characters, bytes (8-bit quantities), integers 
that convey daily temperature. Whether controlled by software or not, 
discrete-time systems are ultimately constructed from digital circuits, which 
consist entirely of analog circuit elements. Furthermore, the transmission 
and reception of discrete-time signals, like e-mail, is accomplished with 
analog signals and systems. Understanding how discrete-time and analog 
signals and systems intertwine is perhaps the main goal of this course. 


Discrete-Time Systems 


Discrete-time systems can act on discrete-time signals in ways similar to 
those found in analog signals and systems. Because of the role of software 


in discrete-time systems, many more different systems can be envisioned 
and "constructed" with programs than can be with analog signals. In fact, a 
special class of analog signals can be converted into discrete-time signals, 
processed with software, and converted back into an analog signal, all 
without the incursion of error. For such signals, systems can be easily 
produced in software, with equivalent analog realizations difficult, if not 
impossible, to design. 


Systems in the Time-Domain 


A discrete-time signal s(n) is delayed by no samples when we write s(n — no), with 
no > 0. Choosing no to be negative advances the signal along the integers. As 
opposed to analog delays, discrete-time delays can only be integer valued. In the 
frequency domain, delaying a signal corresponds to a linear phase shift of the signal's 
discrete-time Fourier transform: s(n — ng) + e~ C?) S (en), 


Linear discrete-time systems have the superposition property. 
Equation: 
Superposition 


S(ayx1(n) + azx2(n)) = a,S(x1(n)) + a2S(x2(n)) 
A discrete-time system is called shift-invariant (analogous to time-invariant analog 


systems) if delaying the input delays the corresponding output. 
Equation: 


Shift-Invariant 


If S(ax(n)) = y(n), Then S(ax(n — no)) = y(n — no) 


We use the term shift-invariant to emphasize that delays can only have integer values 
in discrete-time, while in analog signals, delays can be arbitrarily valued. 


We want to concentrate on systems that are both linear and shift-invariant. It will be 
these that allow us the full power of frequency-domain analysis and implementations. 
Because we have no physical constraints in "constructing" such systems, we need 
only a mathematical specification. In analog systems, the differential equation 
specifies the input-output relationship in the time-domain. The corresponding 
discrete-time specification is the difference equation. 
Equation: 

The Difference Equation 


y(n) = ayy(n — 1) +... + apy(n — p) + box(n) + bia(n — 1) +... + bgx(n — q) 


Here, the output signal y(n) is related to its past values y(n — l), 1 = {1,...,p}, and 
to the current and past values of the input signal (n). The system's characteristics 
are determined by the choices for the number of coefficients p and q and the 
coefficients’ values {a ,...,a,)} and {bo, b1, ..., by}. 


Note:There is an asymmetry in the coefficients: where is ao ? This coefficient would 
multiply the y(n) term in the difference equation. We have essentially divided the 
equation by it, which does not change the input-output relationship. We have thus 
created the convention that ag is always one. 


As opposed to differential equations, which only provide an implicit description of a 
system (we must somehow solve the differential equation), difference equations 
provide an explicit way of computing the output for any input. We simply express the 
difference equation by a program that calculates each output from the previous output 
values, and the current and previous inputs. 


Discrete Time Convolution 

Convolution is a concept that extends to all systems that are both linear and 
time-invariant (LTT). It will become apparent in this discussion that this 
condition is necessary by demonstrating how linearity and time-invariance 
give rise to convolution. 


Introduction 


Convolution, one of the most important concepts in electrical engineering, 
can be used to determine the output a system produces for a given input 
signal. It can be shown that a linear time invariant system is completely 
characterized by its impulse response. The sifting property of the discrete 
time impulse function tells us that the input signal to a system can be 
represented as a sum of scaled and shifted unit impulses. Thus, by linearity, 
it would seem reasonable to compute of the output signal as the sum of 
scaled and shifted unit impulse responses. That is exactly what the 
operation of convolution accomplishes. Hence, convolution can be used to 
determine a linear time invariant system's output from knowledge of the 
input and the impulse response. 


Convolution and Circular Convolution 


Convolution 


Operation Definition 


Discrete time convolution is an operation on two discrete time signals 
defined by the integral 
Equation: 


(Poa S flag in — A 


k=—0o 


for all signals f, g defined on Z. It is important to note that the operation of 
convolution is commutative, meaning that 
Equation: 


f*9 = 9" f 


for all signals f, g defined on Z. Thus, the convolution operation could 
have been just as easily stated using the equivalent definition 
Equation: 


(F*9) In] = 3° f[n— Ho [A 


k=—0o 


for all signals f, g defined on Z. Convolution has several other important 
properties not listed here but explained and derived in a later module. 


Definition Motivation 


The above operation definition has been chosen to be particularly useful in 
the study of linear time invariant systems. In order to see this, consider a 
linear time invariant system H with unit impulse response h. Given a 
system input signal z we would like to compute the system output signal 
H(z). First, we note that the input can be expressed as the convolution 
Equation: 


by the sifting property of the unit impulse function. By linearity 
Equation: 


Since H(d|n — k]) is the shifted unit impulse response h[n — k], this gives 
the result 
Equation: 


Hence, convolution has been defined such that the output of a linear time 
invariant system is given by the convolution of the system input with the 
system unit impulse response. 


Graphical Intuition 


It is often helpful to be able to visualize the computation of a convolution in 
terms of graphical processes. Consider the convolution of two functions 


f,g given by 
Equation: 


Eon- S flAlgin—&] = > fm — Mol. 


k=—0o k=—0o 


The first step in graphically understanding the operation of convolution is to 
plot each of the functions. Next, one of the functions must be selected, and 
its plot reflected across the k = 0 axis. For each real n, that same function 
must be shifted left by n. The point-wise product of the two resulting plots 
is then computed, and then all of the values are summed. 


Example: 
Recall that the impulse response for a discrete time echoing feedback 
system with gain a is 


Equation: 
h |n] = a”u [n], 


and consider the response to an input signal that is another exponential 
Equation: 


We know that the output for this input is given by the convolution of the 
impulse response with the input signal 
Equation: 


yin] = z[n]*h[n]. 


We would like to compute this operation by beginning in a way that 
minimizes the algebraic complexity of the expression. However, in this 
case, each possible choice is equally simple. Thus, we would like to 
compute 

Equation: 


y [n] = 5 ařu [k]b" "u [n — k]. 


k=—o0o 
The step functions can be used to further simplify this sum. Therefore, 
Equation: 
y[n] = 0 


forn < 0 and 
Equation: 


eS 
k=0 


for n > 0. Hence, provided ab Æ 1, we have that 
Equation: 


Circular Convolution 


Discrete time circular convolution is an operation on two finite length or 
periodic discrete time signals defined by the sum 
Equation: 


(f ®g) in] = X` f [kg [n — k] 


for all signals f, g defined on Z[0, N — 1] where f, g are periodic 
extensions of f and g. It is important to note that the operation of circular 
convolution is commutative, meaning that 

Equation: 


f®eg=g9ef 


for all signals f, g defined on Z[0, N — 1]. Thus, the circular convolution 
operation could have been just as easily stated using the equivalent 
definition 

Equation: 


(f ®g)[n] = X` fin — kg [k] 


for all signals f, g defined on Z[0, N — 1] where f, ĝ are periodic 
extensions of f and g. Circular convolution has several other important 
properties not listed here but explained and derived in a later module. 


Alternatively, discrete time circular convolution can be expressed as the 
sum of two summations given by 


Equation: 
n N-1 
(F9) ln] =X flklgin—k]+ X flkigin—-k+N] 
k=0 k=n+1 


for all signals f, g defined on Z[0, N — 1]. 


Meaningful examples of computing discrete time circular convolutions in 
the time domain would involve complicated algebraic manipulations 
dealing with the wrap around behavior, which would ultimately be more 
confusing than helpful. Thus, none will be provided in this section. Of 
course, example computations in the time domain are easy to program and 
demonstrate. However, disrete time circular convolutions are more easily 
computed using frequency domain tools as will be shown in the discrete 
time Fourier series section. 


Definition Motivation 


The above operation definition has been chosen to be particularly useful in 
the study of linear time invariant systems. In order to see this, consider a 
linear time invariant system H with unit impulse response h. Given a 
periodic system input signal x we would like to compute the system output 
signal H (x). First, we note that the input can be expressed as the circular 
convolution 

Equation: 


by the sifting property of the unit impulse function. By linearity, 
Equation: 


Since H(6|n — k]) is the shifted unit impulse response h[n — k], this gives 
the result 
Equation: 


Hence, circular convolution has been defined such that the output of a linear 
time invariant system is given by the convolution of the system input with 
the system unit impulse response. 


Graphical Intuition 


It is often helpful to be able to visualize the computation of a circular 
convolution in terms of graphical processes. Consider the circular 
convolution of two finite length functions f, g given by 

Equation: 


(f ® 9) [n] = X FRG in —k] = SO fin — kjô [k]. 


The first step in graphically understanding the operation of convolution is to 
plot each of the periodic extensions of the functions. Next, one of the 
functions must be selected, and its plot reflected across the k = 0 axis. For 
each n € Z[0, N — 1], that same function must be shifted left by n. The 
point-wise product of the two resulting plots is then computed, and finally 
all of these values are summed. 


Interactive Element 


vtimeshiftDemo 


Interact (when online) with the Mathematica CDF 
demonstrating Discrete Linear Convolution. To 
download, right click and save file as .cdf 


Convolution Summary 


Convolution, one of the most important concepts in electrical engineering, 
can be used to determine the output signal of a linear time invariant system 
for a given input signal with knowledge of the system's unit impulse 
response. The operation of discrete time convolution is defined such that it 
performs this function for infinite length discrete time signals and systems. 
The operation of discrete time circular convolution is defined such that it 
performs this function for finite length and periodic discrete time signals. In 
each case, the output of the system is the convolution or circular 
convolution of the input signal with the unit impulse response. 


Introduction to Fourier Analysis 
Lists the four Fourier transforms and when to use them. 


Fourier's Daring Leap 


Fourier postulated around 1807 that any periodic signal (equivalently finite 
length signal) can be built up as an infinite linear combination of harmonic 
sinusoidal waves. 


i.e. Given the collection 
Equation: 


on ,) © 
B= {etl 
n=— 00 


any 
Equation: 


f(t) € L* [0, T) 


can be approximated arbitrarily closely by 
Equation: 


f(t) = 3 C, eft, 


n=— Oo 


Now, The issue of exact convergence did bring Fourier much criticism from 
the French Academy of Science (Laplace, Lagrange, Monge and LaCroix 
comprised the review committee) for several years after its presentation on 
1807. It was not resolved for also a century, and its resolution is interesting 
and important to understand from a practical viewpoint. See more in the 
section on Gibbs Phenomena. 


Fourier analysis is fundamental to understanding the behavior of signals 
and systems. This is a result of the fact that sinusoids are Eigenfunctions of 
linear, time-invariant (LTI), systems. This is to say that if we pass any 
particular sinusoid through a LTI system, we get a scaled version of that 
same sinusoid on the output. Then, since Fourier analysis allows us to 
redefine the signals in terms of sinusoids, all we need to do is determine 
how any given system effects all possible sinusoids (its transfer function) 
and we have a complete understanding of the system. Furthermore, since 
we are able to define the passage of sinusoids through a system as 
multiplication of that sinusoid by the transfer function at the same 
frequency, we can convert the passage of any signal through a system from 
convolution (in time) to multiplication (in frequency). These ideas are what 
give Fourier analysis its power. 


Now, after hopefully having sold you on the value of this method of 
analysis, we must examine exactly what we mean by Fourier analysis. The 
four Fourier transforms that comprise this analysis are the Fourier Series, 
Continuous-Time Fourier Transform, Discrete-Time Fourier Transform and 
Discrete Fourier Transform. For this document, we will view the Laplace 
Transform and Z-Transform as simply extensions of the CTFT and DTFT 
respectively. All of these transforms act essentially the same way, by 
converting a signal in time to an equivalent signal in frequency (sinusoids). 
However, depending on the nature of a specific signal i.e. whether it is 
finite- or infinite-length and whether it is discrete- or continuous-time) there 
is an appropriate transform to convert the signal into the frequency domain. 
Below is a table of the four Fourier transforms and when each is 
appropriate. It also includes the relevant convolution for the specified 
space. 


Frequency 
Transform Time Domain Domain Convolution 


Transform 


Continuous- 
Time 
Fourier 
Series 


Continuous- 
Time 
Fourier 
Transform 


Discrete- 
Time 
Fourier 
Transform 


Discrete 
Fourier 
Transform 


Time Domain 


L*((0,T)) 


L*(R) 


(Z) 


1” ([0, N — 1) 


Table of Fourier Representations 


Frequency 
Domain 


(Z) 


L? (R) 


L?({0, 277)) 


1” ([0, N — 1) 


Convolution 


Continuous- 
Time 
Circular 


Continuous- 
Time Linear 


Discrete- 
Time Linear 


Discrete- 
Time 
Circular 


Continuous Time Fourier Transform (CTFT) 
Details the Continuous-Time Fourier Transform. 


Introduction 


In this module, we will derive an expansion for any arbitrary continuous- 
time function, and in doing so, derive the Continuous Time Fourier 
Transform (CTFT). 


Since complex exponentials are eigenfunctions of linear time-invariant 
(LTI) systems, calculating the output of an LTI system # given e% as an 
input amounts to simple multiplication, where H(s) € C is the eigenvalue 
corresponding to s. As shown in the figure, a simple exponential input 
would yield the output 

Equation: 


[missing_resource: simpleLTIsys.png] 


Using this and the fact that # is linear, calculating y(t) for combinations 
of complex exponentials is also straightforward. 


cie + cze°* > c1 H(s1)e®™ + coH(s2)e*™ 


` ee = ` CnH(sn)e*"" 
n n 


The action of H on an input such as those in the two equations above is 
easy to explain. # independently scales each exponential component e°”* 
by a different complex number H(s,,) € C. As such, if we can write a 
function f(t) as a combination of complex exponentials it allows us to 
easily calculate the output of a system. 


Now, we will look to use the power of complex exponentials to see how we 
may represent arbitrary signals in terms of a set of simpler functions by 
superposition of a number of complex exponentials. Below we will present 


the Continuous-Time Fourier Transform (CTFT), commonly referred to 
as just the Fourier Transform (FT). Because the CTFT deals with 
nonperiodic signals, we must find a way to include all real frequencies in 
the general equations. For the CTFT we simply utilize integration over real 
numbers rather than summation over integers in order to express the 
aperiodic signals. 


Fourier Transform Synthesis 


Joseph Fourier demonstrated that an arbitrary s(t) can be written as a linear 
combination of harmonic complex sinusoids 


Equation: 
= . 
s(t) = pa ge 
n=—00 
where wo = 22 is the fundamental frequency. For almost all s(t) of 


practical interest, there exists c,, to make [link] true. If s(t) is finite energy ( 
s(t) € L?[0, T]), then the equality in [link] holds in the sense of energy 
convergence; if s(t) is continuous, then [link] holds pointwise. Also, if s(t) 
meets some mild conditions (the Dirichlet conditions), then [link] holds 
pointwise everywhere except at points of discontinuity. 


The cpn - called the Fourier coefficients - tell us "how much" of the sinusoid 
eJont is in s(t). The formula shows s(t) as a sum of complex exponentials, 
each of which is easily processed by an LTT system (since it is an 
eigenfunction of every LTI system). Mathematically, it tells us that the set 
of complex exponentials {Vn, neZ: (esvont) } form a basis for the space 
of T-periodic continuous time functions. 


Equations 


Now, in order to take this useful tool and apply it to arbitrary non-periodic 
signals, we will have to delve deeper into the use of the superposition 


principle. Let sy (t) be a periodic signal having period T. We want to 
consider what happens to this signal's spectrum as the period goes to 
infinity. We denote the spectrum for any assumed value of the period by 
Cn (T). We calculate the spectrum according to the Fourier formula for a 
periodic signal, known as the Fourier Series (for more on this derivation, 
see the section on Fourier Series.) 

Equation: 


1 rr 
Cn = = f s (t) exp (—1wot) dt 
T Jo 


where wo = x and where we have used a symmetric placement of the 
integration interval about the origin for subsequent derivational 
convenience. We vary the frequency index n proportionally as we increase 
the period. Define 

Equation: 


St (f) =Tcen = zi (sr (f) exp (1wot) dt 


making the corresponding Fourier Series 
Equation: 


sr (t) = SF (t) exp (wot) x) 


As the period increases, the spectral lines become closer together, becoming 
a continuum. Therefore, 
Equation: 


lim sr (t) = s(t) = [sp exp (1wot) df 


with 


Equation: 
S(f) = | Olena a 
Equation: 
Continuous-Time Fourier Transform 
F(N) = J feh at 
Equation: 


Inverse CTFT 


j= = / PD ŽAR 


Note:It is not uncommon to see the above formula written slightly 
different. One of the most common differences is the way that the 
exponential is written. The above equations use the radial frequency 
variable 2 in the exponential, where {2 = 27 f, but it is also common to 
include the more explicit expression, 727 ft, in the exponential. Click here 
for an overview of the notation used in Connexion's DSP modules. 


Example: 
We know from Euler's formula that . 
cos (wt)+ sin (wt) = I ejut =+ Fd eiet, 


CTFT Definition Demonstration 


#:CTFTDemo 
Interact (when online) with a Mathematica 
CDF demonstrating Continuous Time Fourier 


Transform. To Download, right-click and save 
as .cdf. 


Example Problems 


Exercise: 


Problem: Find the Fourier Transform (CTFT) of the function 


Equation: 
—(ot) if t>0 
e i 
ft) = l — 
0 otherwise 


Solution: 


In order to calculate the Fourier transform, all we need to use is [link], 
complex exponentials, and basic calculus. 
Equation: 


BD) = SP fet at 
= Mi e7 (at) e —(iNt) dt 
= fo e )\(a+i9) )dt 


—1 
= 0- ati 
Equation: 
1 
Q ar 
(2) a+ifd 
Exercise: 
Problem: 


Find the inverse Fourier transform of the ideal lowpass filter defined 


by 
Equation: 
1if |2| <M 
X(2) = 
0 otherwise 
Solution: 


Here we will use [link] to find the inverse FT given that t Æ 0. 
Equation: 


a(t) = + fy, eda 


1 i(O,t . 
ae" ) |o, a= 


= + sin(Mt) 


Equation: 


Fourier Transform Summary 


Because complex exponentials are eigenfunctions of LTI systems, it is often 
useful to represent signals using a set of complex exponentials as a basis. 
The continuous time Fourier series synthesis formula expresses a 
continuous time, periodic function as the sum of continuous time, discrete 
frequency complex exponentials. 

Equation: 


j= Se 


n=— o0 


The continuous time Fourier series analysis formula gives the coefficients 
of the Fourier series expansion. 
Equation: 


1 


T 
SE —(jwont) 
cn = 7 J f(t)e dt 


In both of these equations wọ = oh is the fundamental frequency. 


Discrete-Time Fourier Transform (DTFT) 
Discussion of Discrete-time Fourier Transforms. Topics include comparison 
with analog transforms and discussion of Parseval's theorem. 


The Fourier transform of the discrete-time signal s(n) is defined to be 
Equation: 


Frequency here has no units. As should be expected, this definition is linear, 
with the transform of a sum of signals equaling the sum of their transforms. 
Real-valued signals have conjugate-symmetric spectra: 


Si(e- 772) = S(e?F), 
Exercise: 


Problem: 


A special property of the discrete-time Fourier transform is that it is 
periodic with period one: § (ent TA =S (erf J Derive this 
property from the definition of the DTFT. 


Solution: 
Equation: 
Sterns) = SS ae s(n)e— 2n(ft In) 


= yo o € ET s(n)e-02rfn) 
= y s(n)e— rfn) 


Because of this periodicity, we need only plot the spectrum over one period 
to understand completely the spectrum's structure; typically, we plot the 


spectrum over the frequency range |— +, +]. When the signal is real- 
2? 2 


valued, we can further simplify our plotting chores by showing the 
spectrum only over [0, +| ; the spectrum at negative frequencies can be 
derived from positive-frequency spectral values. 


When we obtain the discrete-time signal via sampling an analog signal, the 
this, note that a sinusoid having a frequency equal to the Nyquist frequency 
+ has a sampled waveform that equals 


1 
cos (27 x opt) = cos(mn) = (—1)” 
2T's 


The exponential in the DTFT at frequency = equals 
_ i2mn 


a = = (—1)”, meaning that discrete-time frequency equals 
analog frequency multiplied by the sampling interval 
Equation: 


— (inn) 


fp = faT; 


fp and f4 represent discrete-time and analog frequency variables, 
respectively. The aliasing figure provides another way of deriving this 
result. As the duration of each pulse in the periodic sampling signal pr, (t) 
narrows, the amplitudes of the signal's spectral repetitions, which are 
governed by the Fourier series coefficients of pr, (t), become increasingly 
equal. Examination of the periodic pulse signal reveals that as A decreases, 


the value of co, the largest Fourier coefficient, decreases to zero: |co| = = 


. Thus, to maintain a mathematically viable Sampling Theorem, the 
amplitude A must increase as L, becoming infinitely large as the pulse 
duration decreases. Practical systems use a small value of A, say 0.1-T, 
and use amplifiers to rescale the signal. Thus, the sampled signal's spectrum 
becomes periodic with period E Thus, the Nyquist frequency 57 


corresponds to the frequency +, 


Example: 

Let's compute the discrete-time Fourier transform of the exponentially 
decaying sequence s(n) = a”u(n), where u(n) is the unit-step sequence. 
Simply plugging the signal's expression into the Fourier transform formula, 
Equation: 


S(e?*/) = Yo aue 2” 
Seas (ae~(nf))" 


This sum is a special case of the geometric series. 
Equation: 


oS 1 
Soa” = Va, |a| <1: (=) 
L= 


n=0 E 


Thus, as long as |a| < 1, we have our Fourier transform. 
Equation: 


1 


ORIN 
C 1 — ae (rf) 


Using Euler's relation, we can express the magnitude and phase of this 
spectrum. 
Equation: 


S(e?"/) 1 


(1 — acos(27f))* + a? sin?(27f) 


Equation: 


Z(8(e®*f)) = -tan = 


1 — a cos(2r f) 


No matter what value of a we choose, the above formulae clearly 
demonstrate the periodic nature of the spectra of discrete-time signals. 
[link] shows indeed that the spectrum is a periodic function. We need only 


consider the spectrum between — + and 4 to unambiguously define it. 

When a > 0, we have a lowpass spectrum—the spectrum diminishes as 
frequency increases from 0 to +—with increasing a leading to a greater 
low frequency content; for a < 0, we have a highpass spectrum ({link]). 


Spectrum of exponential signal 


The spectrum of the exponential signal 
(a = 0.5) is shown over the frequency 
range [-2, 2], clearly demonstrating the 
periodicity of all discrete-time spectra. 
The angle has units of degrees. 


Spectra of exponential signals 


Angle degrees) Spectral Magnitude (dB) 


The spectra of several 
exponential signals are shown. 
What is the apparent relationship 
between the spectra for a = 0.5 
anda = —0.5? 


Example: 

Analogous to the analog pulse signal, let's find the spectrum of the length- 
N pulse sequence. 

Equation: 


1ifO<n<N-I1 
s(n) = ; 
0 otherwise 
The Fourier transform of this sequence has the form of a truncated 
geometric Series. 
Equation: 


For the so-called finite geometric series, we know that 
Equation: 


N+no-1 N 


l-a 


N=NQ l-a 


for all values of a. 


Exercise: 
Problem: 
Derive this formula for the finite geometric series sum. The "trick" is 


to consider the difference between the series' sum and the sum of the 
series multiplied by a. 


Solution: 
N+no-1 N+no-1 
a X a” — y a” = Qtr — qr 
n=nN0o n=no 


which, after manipulation, yields the geometric sum formula. 


Applying this result yields ([link].) 


Equation: 
i2 _ 1—e (27 fN) 
S(e' a) = let 
— e—(ixf(N—-1)) sia(rfN) 
a sin(r f) 
sin( Nz) 


The ratio of sine functions has the generic form of , which is known 


sin(x) 
as the discrete-time sinc function dsinc (x). Thus, our transform can be 
concisely expressed as S Ca ) = e “"/(N—~1)) dsinc (rf). The discrete- 


time pulse's spectrum contains many ripples, the number of which increase 
with NV, the pulse's duration. 
Spectrum of length-ten pulse 


a 
g 0 
i = 
mH 
a 5 
E 
g : f 
i 05 
= 
A 
90 
0.5 
BO f 
a 
$ -20 
1 


The spectrum of a length-ten pulse 
is shown. Can you explain the 
rather complicated appearance of 
the phase? 


The inverse discrete-time Fourier transform is easily derived from the 
following relationship: 
Equation: 


f 


2 


a 


e~(2rfm) pining f — $ if m=n 


Therefore, we find that 
Equation: 


o 
role 
| 


: S (ent jenn df = IEA DI mye (i2n fm) ei2nfn q f 


> s(m) f7, eM- q f 


The Fourier transform pairs in discrete-time are 
Equation: 


S (e?) = yo = s(n)e (fn) 


n=— 


= f2, S (e e?r d f 


The properties of the discrete-time Fourier transform mirror those of the 
analog Fourier transform. The DIFT properties table shows similarities and 
differences. One important common property is Parseval's Theorem. 
Equation: 


n=— o0 


To show this important property, we simply substitute the Fourier transform 
expression into the frequency-domain expression for power. 
Equation: 


P PAF = PF e PH) a a] 


Some s(n)s(n) sh eee d f 


2 


Using the orthogonality relation, the integral equals 6(m — n), where ô(n) 
is the unit sample. Thus, the double sum collapses into a single sum because 
nonzero values occur only when n = m, giving Parseval's Theorem as a 
result. We term 5~,,, s?(m) the energy in the discrete-time signal s(n) in 


spite of the fact that discrete-time signals don't consume (or produce for that 
matter) energy. This terminology is a carry-over from the analog world. 
Exercise: 


Problem: 


Suppose we obtained our discrete-time signal from values of the 
product s(t)pr,(t), where the duration of the component pulses in 
pr,(t) is A. How is the discrete-time signal energy related to the total 
energy contained in s(t)? Assume the signal is bandlimited and that 
the sampling rate was chosen appropriate to the Sampling Theorem's 
conditions. 


Solution: 


If the sampling frequency exceeds the Nyquist frequency, the spectrum 
of the samples equals the analog spectrum, but over the normalized 
analog frequency fT. Thus, the energy in the sampled signal equals 
the original signal's energy multiplied by 7’. 


DFT as a Matrix Operation 


Matrix Review 
Recall: 


e Vectors in RY: 


Vri ti ER: g= 


e Vectors in C™: 


Vre; r; EC: g= 


e Transposition: 
1. transpose: 
z = (fo Žij 
2. conjugate: 
æ = (zo i 
e Inner product: 


1. real: 


To 
Tı 


LN-1 
LO 
Ti 


£TN-1 


TN-—1) 


£N—1) 


2. complex: 


N-1 
H 
L Y= X Eain 
i=0 
e Matrix Multiplication: 
a00 aoi s+» G0,N-1 To Yo 
10 Q11 tee Q1,N-1 L1 Yı 
Az = ; i = 
QN-1,0 @N-1,1 ++. Q@N-1,N-1 N= YN-1 
Not 
Yk = X Akntn 
n=0 
e Matrix Transposition: 
a00 10 -++  QN-1,0 
r agı a11 se)  QN-—1,1 
A = 
Q0,N-1 @1,N-1 +--+ Q@N-1,N-1 


Matrix transposition involved simply swapping the rows with columns. 
Al = AT 
The above equation is Hermitian transpose. 
A i= Anh 


H 
A kn = Ank 


Representing DFT as Matrix Operation 


Now let's represent the DFT in vector-matrix notation. 


XIN -]] 


Here z is the vector of time samples and X is the vector of DFT 
coefficients. How are æ and X related: 


Xk = ` r[n]e rk") 
n=0 
where 
zz kn 
Akn = (e 6) = Wn 
SO 


X—Wz2 


where X is the DFT vector, W is the matrix and æ the time domain vector. 


Wie (eD) 


IDFT: 


where 


Wn™ is the matrix Hermitian transpose. So, 
ee 
e=—W X 
N 


where 2 is the time vector, +w! is the inverse DFT matrix, and X is the 
DFT vector. 


Introduction 
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Why sample? 


This section introduces sampling. Sampling is the necessary fundament for 
all digital signal processing and communication. Sampling can be defined 
as the process of measuring an analog signal at distinct points. 


Digital representation of analog signals offers advantages in terms of 


robustness towards noise, meaning we can send more bits/s 

use of flexible processing equipment, in particular the computer 
more reliable processing equipment 

easier to adapt complex algorithms 


Claude E. Shannon 


Claude 
Elwood 
Shannon 

(1916-2001) 


Claude Shannon has been called the father of information theory, mainly 
due to his landmark papers on the "Mathematical theory _of 


in 1928, but it was not proven until Shannon proved it 21 years later in the 
paper "Communications in the presence of noise". 


Notation 
In this chapter we will be using the following notation 


e Original analog signal x(t) 

e Sampling frequency F, 

e Sampling interval T, (Note that: F; = =) 

e Sampled signal x,(n). (Note that zs(n) = x(nT;)) 
e Real angular frequency 2 

e Digital angular frequency w. (Note that: w = RT) 


The Sampling Theorem 


Note: When sampling an analog signal the sampling frequency must be 
greater than twice the highest frequency component of the analog signal to 
be able to reconstruct the original signal from the sampled version. 


Proof 


Note: In order to recover the signal x(t) from it's samples exactly, it is 
necessary to sample x(t) at a rate greater than twice it's highest frequency 
component. 


Introduction 


As mentioned earlier, sampling is the necessary fundament when we want 
to apply digital signal processing on analog signals. 


Here we present the proof of the sampling theorem. The proof is divided in 
two. First we find an expression for the spectrum of the signal resulting 
from sampling the original signal x(t). Next we show that the signal x(t) 
can be recovered from the samples. Often it is easier using the frequency 
domain when carrying out a proof, and this is also the case here. 

Key points in the proof 


e We find an equation for the spectrum of the sampled signal 
e We find a simple method to reconstruct the original signal 
e The sampled signal has a periodic spectrum... 

e ...and the period is 2 x nF, 


Proof part 1 - Spectral considerations 


By sampling x(t) every T, second we obtain x,(n). The inverse fourier 
transform of this time discrete signal is 
Equation: 


x(n) = = X, (e™)e dw 


—= 7G 


For convenience we express the equation in terms of the real angular 
frequency {2 using w = RT,. We then obtain 
Equation: 


T 


T; f: X, (er je dQ 


On 


x(n) 


The inverse fourier transform of a continuous signal is 
Equation: 


1 i 
a(t) = = I X(iMe! 4.2 


From this equation we find an expression for x (nT; ) 
Equation: 


1 on ; 
(nT) = = / XİD 4.2 


To account for the difference in region of integration we split the integration 
in [link] into subintervals of length ae and then take the sum over the 
resulting integrals to obtain the complete area. 

Equation: 


(2k+1)7 


2xtk 


Then we change the integration variable, setting (2 = 7 + a 


Equation: 


i2 xnkn 


We obtain the final form by observing that e 
ai Ti 
and multiplying by + 


= 1, reinserting n = 2 


Equation: 
T faas i 2xnk\\ ; 
Taed Y =X AEn inl, ANQ 
a ) 27 - b= 66 T; (i ( T; ))e 


To make z,(n) = x(nT;) for all values of n, the integrands in [link] and 
[link] have to agreee, that is 


Equation: 
iT. 1 2rk 
KM) =F 2 x( (2+ 7 )) 


8 k=—oo 


This is a central result. We see that the digital spectrum consists of a sum of 
shifted versions of the original, analog spectrum. Observe the periodicity! 


We can also express this relation in terms of the digital angular frequency 


w = QT, 
Equation: 


T 


8 k=—o00 


This concludes the first part of the proof. Now we want to find a 
reconstruction formula, so that we can recover x(t) from g,(n). 


Proof part II - Signal reconstruction 


For a bandlimited signal the inverse fourier transform is 
Equation: 


1 fr 
Ce I X(iMei a N 
2m Ei 


T. 


X(i2) 


S 


In the interval we are integrating we have: X,(e"7") = 
Substituting this relation into [link] we get 


Equation: 


A= T's pe X, (cic a 2 


Using the DTFT relation for X, (e177) we have 
Equation: 


Tt. fe = P? . 
z(t) = f ` a,(n)e Tita Q 


Ts n=—00 


Interchanging integration and summation (under the assumption of 
convergence) leads to 
Equation: 


Finally we perform the integration and arrive at the important 
reconstruction formula 
Equation: 


(Thanks to R.Loos for pointing out an error in the proof.) 


Summary 


Illustrations 


In this module we illustrate the processes involved in sampling and 
reconstruction. To see how all these processes work together as a whole, 
take a look at the system view. In Sampling and reconstruction with Matlab 
we provide a Matlab script for download. The matlab script shows the 
process of sampling and reconstruction live. 


Basic examples 


Example: 
To sample an analog signal with 3000 Hz as the highest frequency 
component requires sampling at 6000 Hz or above. 


Example: 

The sampling theorem can also be applied in two dimensions, i.e. for 
image analysis. A 2D sampling theorem has a simple physical 
interpretation in image analysis: Choose the sampling interval such that it 
is less than or equal to half of the smallest interesting detail in the image. 


The process of sampling 


We start off with an analog signal. This can for example be the sound 
coming from your stereo at home or your friend talking. 


The signal is then sampled uniformly. Uniform sampling implies that we 
sample every T, seconds. In [link] we see an analog signal. The analog 
signal has been sampled at times t = n7T’,. 


Analog signal, samples are marked with dots. 


In signal processing it is often more convenient and easier to work in the 
frequency domain. So let's look at at the signal in frequency domain, [link]. 
For illustration purposes we take the frequency content of the signal as a 
triangle. (If you Fourier transform the signal in [link] you will not get such 
a nice triangle.) 


= 
ot 


The spectrum X (i). 


Notice that the signal in [link] is bandlimited. We can see that the signal is 
bandlimited because X(i{2) is zero outside the interval [~ 24, 2g]. 
Equivalentely we can state that the signal has no angular frequencies above 


N24, corresponding to no frequencies above F} = 54. 

Now let's take a look at the sampled signal in the frequency domain. While 
proving the sampling theorem we found the the spectrum of the sampled 
signal consists of a sum of shifted versions of the analog spectrum. 
Mathematically this is described by the following equation: 


Equation: 
~~ 2 
> *((era)) 
k=—0o Ts 


X, i) z 


n|- 


Sampling fast enough 


In [link] we show the result of sampling x(t) according to the sampling 
theorem. This means that when sampling the signal in [link]/[link] we use 
F, > 2F,. Observe in [link] that we have the same spectrum as in [link] for 
N € |-Q,4, Rgl, except for the scaling factor A This is a consequence of 


the sampling frequency. As mentioned in the proof the spectrum of the 


sampled signal is periodic with period 27 F, = a 


The spectrum X+.. Sampling frequency is OK. 


So now we are, according to the sample theorem, able to reconstruct the 
original signal exactly. How we can do this will be explored further down 
under reconstruction. But first we will take a look at what happens when we 
sample too slowly. 


Sampling too slowly 


If we sample x(t) too slowly, that is F; < 2 F4, we will get overlap 
between the repeated spectra, see [link]. According to [link] the resulting 
spectra is the sum of these. This overlap gives rise to the concept of 
aliasing. 


Note: If the sampling frequency is less than twice the highest frequency 
component, then frequencies in the original signal that are above half the 
sampling rate will be "aliased" and will appear in the resulting signal as 
lower frequencies. 


The consequence of aliasing is that we cannot recover the original signal, so 
aliasing has to be avoided. Sampling too slowly will produce a sequence 
x(n) that could have orginated from a number of signals. So there is no 
chance of recovering the original signal. To learn more about aliasing, take 
a look at this module. (Includes an applet for demonstration!) 


27 An Q 
Ts Ts 


The spectrum X ,. Sampling frequency is too low. 


To avoid aliasing we have to sample fast enough. But if we can't sample fast 
enough (possibly due to costs) we can include an Anti-Aliasing filter. This 
will not able us to get an exact reconstruction but can still be a good 
solution. 


Note: Typically a low-pass filter that is applied before sampling to ensure 
that no components with frequencies greater than half the sample 
frequency remain. 


Example: 

The stagecoach effect 

In older western movies you can observe aliasing on a stagecoach when it 
starts to roll. At first the spokes appear to turn forward, but as the 
stagecoach increase its speed the spokes appear to turn backward. This 
comes from the fact that the sampling rate, here the number of frames per 
second, is too low. We can view each frame as a sample of an image that is 
changing continuously in time. (Applet illustrating the stagecoach effect) 


Reconstruction 


Given the signal in [link] we want to recover the original signal, but the 
question is how? 


When there is no overlapping in the spectrum, the spectral component given 
by k = 0 (see [link]),is equal to the spectrum of the analog signal. This 
offers an oppurtunity to use a simple reconstruction process. Remember 
what you have learned about filtering. What we want is to change signal in 
[link] into that of [link]. To achieve this we have to remove all the extra 
components generated in the sampling process. To remove the extra 
components we apply an ideal analog low-pass filter as shown in [link] As 
we see the ideal filter is rectangular in the frequency domain. A rectangle in 
the frequency domain corresponds to a sinc function in time domain (and 
vice versa). 


IH (®)| 


H(iQ2) The ideal reconstruction filter. 


Then we have reconstructed the original spectrum, and as we know if two 
signals are identical in the frequency domain, they are also identical in 
the time domain. End of reconstruction. 


Conclusions 


The Shannon sampling theorem requires that the input signal prior to 
sampling is band-limited to at most half the sampling frequency. Under this 
condition the samples give an exact signal representation. It is truly 
remarkable that such a broad and useful class signals can be represented 
that easily! 


We also looked into the problem of reconstructing the signals form its 
samples. Again the simplicity of the principle is striking: linear filtering by 
an ideal low-pass filter will do the job. However, the ideal filter is 
impossible to create, but that is another story... 


Systems view of sampling and reconstruction 


Ideal reconstruction system 


[link] shows the ideal reconstruction system based on the results of the 
Sampling theorem proof. 


[link] consists of a sampling device which produces a time-discrete 
sequence x,(n). The reconstruction filter, h(t), is an ideal analog sinc 


filter, with h(t) = sinc (+ . We can't apply the time-discrete sequence 


x(n) directly to the analog filter h(t). To solve this problem we turn the 
sequence into an analog signal using delta functions. Thus we write 


w(t) = 3 ss t(n nT), 


x(t) x(n) T(t) è(t) 


Ideal reconstruction system 


But when will the system produce an output Z(t) = x(t)? According to the 
sampling theorem we have x(t) = x(t) when the sampling frequency, F’,, 
is at least twice the highest frequency component of x(t). 


Ideal system including anti-aliasing 


To be sure that the reconstructed signal is free of aliasing it is customary to 
apply a lowpass filter, an anti-aliasing filter, before sampling as shown in 
[link]. 


s(t) z(t) 


Ideal reconstruction system with anti-aliasing filter 


Again we ask the question of when the system will produce an output 


T(t) = s(t)? If the signal is entirely confined within the passband of the 
lowpass filter we will get perfect reconstruction if F, is high enough. 


But if the anti-aliasing filter removes the "higher" frequencies, (which in 
fact is the job of the anti-aliasing filter), we will never be able to exactly 
reconstruct the original signal, s(t). If we sample fast enough we can 
reconstruct x(t), which in most cases is satisfying. 


The reconstructed signal, #(t), will not have aliased frequencies. This is 
essential for further use of the signal. 


Reconstruction with hold operation 


To make our reconstruction system realizable there are many things to look 
into. Among them are the fact that any practical reconstruction system must 
input finite length pulses into the reconstruction filter. This can be 
accomplished by the hold operation. To alleviate the distortion caused by 
the hold opeator we apply the output from the hold device to a compensator. 
The compensation can be as accurate as we wish, this is cost and 
application consideration. 


x(t) T(n) a(t) 
Sampling Hold Compensate — 


More practical reconstruction system with a hold component 


By the use of the hold component the reconstruction will not be exact, but 
as mentioned above we can get as close as we want. 


applet Exercises 


Sampling CT Signals: A Frequency Domain Perspective 


Understanding Sampling in the Frequency Domain 


We want to relate x,(t) directly to z[n]. Compute the CTFT of 


CO 


zs(t)= X` 2-(nT)6(t — nT) 
Equation: 
X(2) = | ee -œ Le(nT)A(t — nT ye) dt 
De Ze(nT) [L Slt- nT) ec O™ dt 


Da L 


= EZ v tinje 


n=— oOo 


= X(w) 


where w = NT and X (w) is the DTFT of zf[n]. 


Note: 


Equation: 
X(w) = +E o X(N- kh) 
= FDR Xp) 


where this last part is 27-periodic. 


Sampling 


wy EX 
xen oahi AXD 
DTET k 


Example: 
Speech 
Speech is intelligible if bandlimited by a CT lowpass filter to the band +4 
kHz. We can sample speech as slowly as ? 
é -4e tetr f 
Testine , 


= 2 CTFT(xtt)) 

= dene g lexie multipl 

fem te g t “ah f E oF ty | 
E Jet N=zrf 


X (w) = DTFT( X0) 


-41T i D wv m o m A 


Note that there is no mention of T or £2,! 


Relating x[n] to sampled x(t) 


Recall the following equality: 


Time FRane 


xlt) = Z xD 6-07) XW) 


+ is i ae 
aiis Freg. axs 
Jaa > tin Ax 


Recall the CTFT relation: 
Equation: 


Q 


so) +> =x( 2) 


where aq is a scaling of time and t is a scaling in frequency. 
Equation: 


X,(Q) = X(T) 


The DFT: Frequency Domain with a Computer Analysis 


Introduction 


We just covered ideal (and non-ideal) (time) sampling of CT signals. This 
enabled DT signal processing solutions for CT applications ({link]): 


x[n] y [n] 
0 SS a] om 


Much of the theoretical analysis of such systems relied on frequency 
domain representations. How do we carry out these frequency domain 
analysis on the computer? Recall the following relationships: 


DTFT 


where w and {2 are continuous frequency variables. 


Sampling DTFT 


Consider the DTFT of a discrete-time (DT) signal |n]. Assume z[n] is of 
finite duration N (i.e., an N-point signal). 
Equation: 


where X(w) is the continuous function that is indexed by the real-valued 
parameter —r < w < r. The other function, z[n], is a discrete function that 


is indexed by integers. 


We want to work with X(w) on a computer. Why not just sample X (w)? 
Equation: 


X[k] = X(4tk 


In [link] we sampled at w = an where k = {0,1,..., N — 1} and X[k] 
for k = {0,..., N — 1} is called the Discrete Fourier Transform (DFT) 
of x[n]. 


Example: 

Finite Duration DT Signal 

[missing resource: sec8_fig2.png] 

The DTFT of the image in [link] is written as follows: 
Equation: 


N-1 


X(w) = ` a[njeer 


n=0 


where w is any 27-interval, for example =r < w < 7. 

Sample X(@) 

[missing resource: sec8_fig3.png] 

where again we sampled at w = tk where k = {0,1,..., M — 1}. For 
example, we take 


il —= IK) 


. In the following section we will discuss in more detail how we should 
choose M, the number of samples in the 27 interval. 
(This is precisely how we would plot X (w) in Matlab.) 


Choosing M 


Case 1 


Given N (length of z[n]), choose M >> N to obtain a dense sampling of 
the DTFT ([link]): 


Case 2 

Choose M as small as possible (to minimize the amount of computation). 

In general, we require M > N in order to represent all information in 
Vn, n = {0,...,N—1}: (2[n]) 


Let's concentrate on M = N: 


for n = {0,..., N — 1} and k = {0,...,N—1} 
numbers + N numbers 
Discrete Fourier Transform (DFT) 


Define 
Equation: 


where N = length (z[n]) and k = {0,..., N — 1}. In this case, M = N. 
Equation: 
DFT 


Equation: 
Inverse DFT (IDFT) 


N-1 
X|k Jeri” 
k=0 


1 
N 


Interpretation 


Represent z|n] in terms of a sum of N complex sinusoids of amplitudes 
X[k] and frequencies 


2 
Vk, k € {0,....N—1}: (a= z) 


Note: Fourier Series with fundamental frequency at 


Remark 1 


IDFT treats x[n] as though it were N-periodic. 


Equation: 


where n € {0,..., N — 1} 
Exercise: 


Problem: What about other values of n? 
Solution: 


zi[n + N] =??? 


Remark 2 


Proof that the IDFT inverts the DFT for n € {0,...,N — 1} 
Equation: 
N Dio Xiker" = 4 Ero Lmao alle Pram etn 
229 


Example: 
Computing DFT 
Given the following discrete-time signal ([link]) with N = 4, we will 


compute the DFT using two different methods (the DFT Formula and 
Sample DTFT): 


x[n] 


1. DFT Formula 
Equation: 


1 O 4. e(-A)2nq2 + e(-#)2nz3 


1 +e) zk 4 e(-dtk 4 eli)}Tk 


Using the above equation, we can solve and get the following results: 


x|0] = 4 
z= 0 
z2 =o 
x|3] = 0 


2. Sample DTFT. Using the same figure, [link], we will take the DTFT 
of the signal and get the following equations: 


Equation: 
Oa eo 

= 1—e(-t4e 

— 1—e(-#)w 

S 
Our sample points will be: 

27k 
w, = 2m Th 


where k = {0, 1, 2, 3} ([link]). 
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Periodicity of the DFT 


DFT X[k] consists of samples of DTFT, so X(w), a 27-periodic DTFT 
signal, can be converted to X[k], an N-periodic DFT. 
Equation: 


N-1 
X[k] = ` r[n]e irin 


n=0 


where e5277” is an N -periodic basis function (See [link]). 


N Samples N Samples 


Also, recall, 
Equation: 


zin] = a Dna Xiker" 
= y Eno Xiker ietan) 


fag 


Example: 
Illustration 


Note: When we deal with the DFT, we need to remember that, in effect, 
this treats the signal as an V-periodic sequence. 


A Sampling Perspective 


Think of sampling the continuous function X(w), as depicted in [link]. 
S(w) will represent the sampling function applied to X(w) and is illustrated 
in [link] as well. This will result in our discrete-time sequence, X[k]. 


N- points 
Cl pevied ) 


Note: Remember the multiplication in the frequency domain is equal to 
convolution in the time domain! 


Inverse DTFT of S(@) 


Equation: 


Given the above equation, we can take the DTFT and get the following 
equation: 
Equation: 


N y d[n — mN] = Sin] 


m=— 00 
Exercise: 
Problem: Why does [link] equal S[n]? 
Solution: 


S|n] is N-periodic, so it has the following Fourier Series: 
Equation: 


= al 2 (—i)2r74n 
ce = wn d[nle N’ dn 
2 

— L 

— N 
Equation: 

Sin] = el? aryn 
k=—0o 

where the DTFT of the exponential in the above equation is equal to 
slw- =). 


So, in the time-domain we have ([link]): 


N*xin] 


Connections 


Fõuviey Jeres 


XG) Discrete : 


Combine signals in [link] to get signals in [link]. 


Discrete + Periodic 
in Freq 
XxX) DFT 


Discrete-Time Processing of CT Signals 


DT Processing of CT Signals 


DSP System 


Hanalei to 
digra id analog ” 


Analysis 
Equation: 
Y.(Q2) = Hyp(Q)Y (NT) 
where we know that Y(w) = X(w)G(w) and G(w) is the frequency 
response of the DT LTI system. Also, remember that 
w = OT 
So, 


Equation: 


Y.(Q) = Htr (QAG(NAT)X(NT) 


where Y.( 2) and Htp (Q) are CTFTs and G(QT) and X(QT) are DTFTs. 


Note: 


OR 


Therefore our final output signal, Y.(2), will be: 
Equation: 


Y.(22) = Ayp(2)G(LT) (= 3 XAR — eo) 


202 
reconstruction filter in the D/A, [link]: 


Now, if Xe(2) is bandlimited to |- L l and we use the usual lowpass 


[missing_resource: sec9_fig2.png] 


Then, 
Equation: 


Y,(Q) = Do if |Q] < & 


0 otherwise 


Summary 


For bandlimited signals sampled at or above the Nyquist rate, we can relate 
the input and output of the DSP system by: 
Equation: 


where 


R O 
G(OT) if |R| < 2 


0 otherwise 


Gere(Q2) = l 


Note 


Gete(2) is LTI if and only if the following two conditions are satisfied: 


1. G(w) is LTI (in DT). 
2. X(T) is bandlimited and sampling rate equal to or greater than 
Nyquist. For example, if we had a simple pulse described by 


X.(t) = u(t — Ty) — u(t — T1) 


where Tı > To. If the sampling period T > 7 — To, then some 
samples might "miss" the pulse while others might not be "missed." 
This is what we term time-varying behavior. 


Example: 


If = > 2B and wı < BT, determine and sketch Y}( 2) using [link]. 


Application: 60Hz Noise Removal 


EKG Voltage Signal 


Unfortunately, in real-world situations electrodes also pick up ambient 60 
Hz signals from lights, computers, etc.. In fact, usually this "60 Hz noise" is 
much greater in amplitude than the EKG signal shown in [link]. [link] 
shows the EKG signal; it is barely noticeable as it has become 
overwhelmed by noise. 


A x(t) 


Our EKG signal, y(t), is 
overwhelmed by noise. 


DSP Solution 


x(t) yle > Dai —> f6) 


noisy EKG DT LTI jia 
"Notch Filter" anal 


[missing_resource: sec9_fig7b.png] 


Sampling Period/Rate 


First we must note that |Y (2) | is bandlimited to +60 Hz. Therefore, the 
minimum rate should be 120 Hz. In order to get the best results we should 
set 


f: = 240 Hz 


N, =2r X (2055) 
S 


[missing_resource: sec9_fig8.png] 


Digital Filter 


Therefore, we want to design a digital filter that will remove the 60Hz 
component and preserve the rest. 


Difference Equation 
(Blank Abstract) 


Introduction 


One of the most important concepts of DSP is to be able to properly represent the input/output 
relationship to a given LTI system. A linear constant-coefficient difference equation (LCCDE) 
serves as a way to express just this relationship in a discrete-time system. Writing the sequence of 
inputs and outputs, which represent the characteristics of the LTI system, as a difference equation help 
in understanding and manipulating a system. 


difference equation 
An equation that shows the relationship between consecutive values of a sequence and the 
differences among them. They are often rearranged as a recursive formula so that a systems 
output can be computed from the input signal and past outputs. 


Example: 
Equation: 


General Formulas for the Difference Equation 


As stated briefly in the definition above, a difference equation is a very useful tool in describing and 
calculating the output of the system described by the formula for a given sample n. The key property 
of the difference equation is its ability to help easily find the transform, H(z), of a system. In the 
following two subsections, we will look at the general form of the difference equation and the general 
conversion to a z-transform directly from the difference equation. 


Difference Equation 


The general form of a linear, constant-coefficient difference equation (LCCDE), is shown below: 
Equation: 


N M 
S > axyln — k| = S brzfn — k] 
k=0 k=0 


We can also write the general form to easily express a recursive output, which looks like this: 
Equation: 


N M 
y(n] = — X axyln — k| + 5 bge[n — k] 
k=l k=0 


From this equation, note that y[n — k] represents the outputs and æ[n — k] represents the inputs. The 
value of N represents the order of the difference equation and corresponds to the memory of the 
system being represented. Because this equation relies on past values of the output, in order to 
compute a numerical solution, certain past outputs, referred to as the initial conditions, must be 
known. 


Conversion to Z-Transform 


Using the above formula, [link], we can easily generalize the transfer function, H(z), for any 
difference equation. Below are the steps taken to convert any difference equation into its transfer 
function, i.e. z-transform. The first step involves taking the Fourier Transform of all the terms in 
[link]. Then we use the linearity property to pull the transform inside the summation and the time- 
shifting property of the z-transform to change the time-shifting terms to exponentials. Once this is 
done, we arrive at the following equation: ag = 1. 


Equation: 
N M 
¥(z)=— X aY (z) + So bg X(2)z* 
k=1 k=0 
Equation: 
Y(z 
He) = xG 
— Bia bz * 


1+5 pa az 


Conversion to Frequency Response 


Once the z-transform has been calculated from the difference equation, we can go one step further to 
define the frequency response of the system, or filter, that is being represented by the difference 
equation. 


Note: Remember that the reason we are dealing with these formulas is to be able to aid us in filter 
design. A LCCDE is one of the easiest ways to represent FIR filters. By being able to find the 
frequency response, we will be able to look at the basic properties of any filter represented by a 
simple LCCDE. 


Below is the general formula for the frequency response of a z-transform. The conversion is simple a 
matter of taking the z-transform formula, H(z), and replacing every instance of z with e’”. 
Equation: 

H(w) H(z) | nemei 
See 


TG ape- (wk) 


II 


Once you understand the derivation of this formula, look at the module concerning Filter Design from 
the Z-Transform for a look into how all of these ideas of the Z-transform, Difference Equation, and 
Pole/Zero Plots play a role in filter design. 


Example 


Example: 

Finding Difference Equation 

Below is a basic example showing the opposite of the steps above: given a transfer function one can 
easily calculate the systems difference equation. 

Equation: 


(z+1)} 
1 3 
(= aa 7) 
Given this transfer function of a time-domain filter, we want to find the difference equation. To begin 


with, expand both polynomials and divide them by the highest order z. 
Equation: 


He) = 


R (z+1)(z+1) 

HO) = epeh 

z? +2z+1 
z?+2z+1— £ 

1+2z2-1+2z-? 
lepe =e 
From this transfer function, the coefficients of the two polynomials will be our a; and 6; values 
found in the general difference equation formula, [link]. Using these coefficients and the above form 
of the transfer function, we can easily write the difference equation: 
Equation: 


alm] + 22[n— 1] + [n — 2] = yin] + Fuln — 1] — Syn — 3] 


In our final step, we can rewrite the difference equation in its more common form showing the 
recursive nature of the system. 
Equation: 


yin] = zin] + 22[n — 1] + zin- 2) + yn - 1) 4 


Solving a LCCDE 


In order for a linear constant-coefficient difference equation to be useful in analyzing a LTI system, 
we must be able to find the systems output based upon a known input, x(n), and a set of initial 
conditions. Two common methods exist for solving a LCCDE: the direct method and the indirect 
method, the later being based on the z-transform. Below we will briefly discuss the formulas for 
solving a LCCDE using each of these methods. 


Direct Method 


The final solution to the output based on the direct method is the sum of two parts, expressed in the 
following equation: 
Equation: 


y(n) = ya(n) + y(n) 


The first part, y} (n), is referred to as the homogeneous solution and the second part, y} (n), is 
referred to as particular solution. The following method is very similar to that used to solve many 
differential equations, so if you have taken a differential calculus course or used differential equations 
before then this should seem very familiar. 


Homogeneous Solution 


We begin by assuming that the input is zero, x(n) = 0. Now we simply need to solve the 
homogeneous difference equation: 
Equation: 


N 
X axyln — k| =0 
k20 


In order to solve this, we will make the assumption that the solution is in the form of an exponential. 
We will use lambda, A, to represent our exponential terms. We now have to solve the following 
equation: 

Equation: 


N 
ya =0 
k=0 


We can expand this equation out and factor out all of the lambda terms. This will give us a large 
polynomial in parenthesis, which is referred to as the characteristic polynomial. The roots of this 
polynomial will be the key to solving the homogeneous equation. If there are all distinct roots, then 
the general solution to the equation will be as follows: 

Equation: 


yaln) = Ci(A1)" + C2(A2)” +... + Cyn(An)” 


However, if the characteristic equation contains multiple roots then the above general solution will be 
slightly different. Below we have the modified version for an equation where A; has K multiple 
roots: 

Equation: 


yaln) = C1(A1)" + Cyn(A1)” + Cin? (à )” Feret Cin HA)” + C2(A2)" ae a Cn(An)” 


Particular Solution 


The particular solution, y,(7), will be any solution that will solve the general difference equation: 
Equation: 


N M 
5 akYp(n — k) = 5 be(n — k) 
k=0 k=0 


In order to solve, our guess for the solution to y,(n) will take on the form of the input, x(n). After 
guessing at a solution to the above equation involving the particular solution, one only needs to plug 
the solution into the difference equation and solve it out. 


Indirect Method 


The indirect method utilizes the relationship between the difference equation and z-transform, 
discussed earlier, to find a solution. The basic idea is to convert the difference equation into a z- 
transform, as described above, to get the resulting output, Y(z). Then by inverse transforming this 
and using partial-fraction expansion, we can arrive at the solution. 

Equation: 


Z{y (n+1) — y (n)} = 2Y (z) — y (0) 


This can be interatively extended to an arbitrary order derivative as in Equation [link]. 
Equation: 


Now, the Laplace transform of each side of the differential equation can be taken 
Equation: 


N N-1 
Z So ax b (n-m+1)— X` y (n-m)y (n) 


which by linearity results in 
Equation: 


N 
> uzly (n-m+1) — 
k=0 


and by differentiation properties in 
Equation: 


Rearranging terms to isolate the Laplace transform of the output, 
Equation: 


Z{a(n)} + Vo mao ez ™ yl (0) 


ys apz* 


Z{y(n)} = 


Thus, it is found that 
Equation: 


X(z)+ pl 0 ae Gee m—ly(™) (0) l 


S apz" 


Y (z) = 


In order to find the output, it only remains to find the Laplace transform X (z) of the input, substitute 
the initial conditions, and compute the inverse Z-transform of the result. Partial fraction expansions 
are often required for this last step. This may sound daunting while looking at [link], but it is often 
easy in practice, especially for low order difference equations. [link] can also be used to determine the 
transfer function and frequency response. 


As an example, consider the difference equation 
Equation: 


y [n-2] + 4y [n-1] + 3y [n] =cos (n) 


with the initial conditions y’ (0) = 1 and y(0) = 0 Using the method described above, the Z 
transform of the solution y[n] is given by 
Equation: 


Performing a partial fraction decomposition, this also equals 
Equation: 


1 1 1 
Y [z] =. 25 35 ai a, 
z+1 z+3 z?+1 z?+1 


Computing the inverse Laplace transform, 
Equation: 


y(n) = (.252°-"—. 35z- "+. 1 cos (n)+. 2 sin (n))u (n). 


One can check that this satisfies that this satisfies both the differential equation and the initial 
conditions. 


The Z Transform: Definition 
A brief definition of the z-transform, explaining its relationship with the 
Fourier transform and its region of convergence, ROC. 


Basic Definition of the Z-Transform 


The z-transform of a sequence is defined as 
Equation: 


Sometimes this equation is referred to as the bilateral z-transform. At 
times the z-transform is defined as 
Equation: 


which is known as the unilateral z-transform. 


There is a close relationship between the z-transform and the Fourier 
transform of a discrete time signal, which is defined as 
Equation: 


Notice that that when the z~” is replaced with e~ “”) the z-transform 


reduces to the Fourier Transform. When the Fourier Transform exists, 
z = e™ , which is to have the magnitude of z equal to unity. 


The Complex Plane 


In order to get further insight into the relationship between the Fourier 
Transform and the Z-Transform it is useful to look at the complex plane or 
z-plane. Take a look at the complex plane: 

Z-Plane 


The Z-plane is a complex plane with an imaginary and real axis referring to 
the complex-valued variable z. The position on the complex plane is given 
by re , and the angle from the positive, real axis around the plane is 
denoted by w. X(z) is defined everywhere on this plane. X (e™) on the 
other hand is defined only where |z| = 1, which is referred to as the unit 
circle. So for example, w = 1 at z = 1 and w = wat z = —1. This is useful 
because, by representing the Fourier transform as the z-transform on the 
unit circle, the periodicity of Fourier transform is easily seen. 


Region of Convergence 


The region of convergence, known as the ROC, is important to understand 
because it defines the region where the z-transform exists. The ROC for a 
given z[|n] , is defined as the range of z for which the z-transform 
converges. Since the z-transform is a power series, it converges when 
z|n|z~” is absolutely summable. Stated differently, 

Equation: 


Oo 


> |jz[n]z-”] < 00 


n=— o0 


must be satisfied for convergence. This is best illustrated by looking at the 
different ROC's of the z-transforms of a”u|n] and a” u[n — 1]. 


Example: 
For 
Equation: 


x[n] = a”uln] 


x[n] = a”u|n| where a = 0.5. 


Equation: 


This sequence is an example of a right-sided exponential sequence because 
it is nonzero for n > 0. It only converges when Jaz | < 1. When it 
converges, 


Equation: 
X(z) = —— 


If |az~+] > 1, then the series, XZ% (az~+)" does not converge. Thus 
the ROC is the range of values where 


Equation: 
[az < Íl 
or, equivalently, 
Equation: 
|z| > [al 


ROC for z[n] = a”ufn] 
where a = 0.5 


Example: 
For 


Equation: 


x[n] = (—a”)u|(—n) — 1] where 
a = 0.5. 


Equation: 


DG —= Se alae 


n=— o0 


= Vr. (-a”)ul-n — 1]z 
Se = ace Qarz” 


a rae (atz) a 
ae a) 

= i= Ss ar) 
The ROC in this case is the range of values where 
Equation: 


[az] <1 


or, equivalently, 
Equation: 


|z| < jal 


If the ROC is satisfied, then 
Equation: 


Table of Common z-Transforms 


Lists the z-transform and region of convergence (ROC) for several common 
discrete-time signals. 


The table below provides a number of unilateral and bilateral z-transforms. 
The table also specifies the region of convergence. 


Note: The notation for z found in the table below may differ from that 
found in other tables. For example, the basic z-transform of u[n] can be 
written as either of the following two expressions, which are equivalent: 


Equation: 
Zo 1 
z-1 l-z! 
Signal Z-Transform ROC 
6[n — k] ze All(z) 
ujn] l z|>1 
—u|(—n) — 1] a 2) 2 


na”uln| 


Ilk n-k+1 a 


a™m! 


y” cos(an)u|n] 


y” sin(an)u[n] 


Z-Transform 


z(z+1) 
(z-1)° 


z(z—7y cos(a)) 


z2—(2ycos(a))z+y? 


zysin(a) 


z?—(2ycos(a))z+y? 


Iz] > | 


Iz] > | 


Understanding Pole/Zero Plots on the Z-Plane 

This module will look at the relationships between the z-transform and the 
complex plane. Specifically, the creation of pole/zero plots and some of 
their useful properties are discussed. 


Introduction to Poles and Zeros of the Z-Transform 


It is quite difficult to qualitatively analyze the Laplace transform and Z- 
transform, since mappings of their magnitude and phase or real part and 
imaginary part result in multiple mappings of 2-dimensional surfaces in 3- 
dimensional space. For this reason, it is very common to examine a plot of a 
transfer function's poles and zeros to try to gain a qualitative idea of what a 
system does. 


Once the Z-transform of a system has been determined, one can use the 
information contained in function's polynomials to graphically represent the 
function and easily observe many defining characteristics. The Z-transform 
will have the below structure, based on Rational Functions: 

Equation: 


The two polynomials, P(z) and Q(z), allow us to find the poles and zeros 
of the Z-Transform. 


Zeros 
The value(s) for z where P(z) = 0. 
The complex frequencies that make the overall gain of the filter 
transfer function zero. 


poles 
The value(s) for z where Q(z) = 0. 
The complex frequencies that make the overall gain of the filter 
transfer function infinite. 


Example: 
Below is a simple transfer function with the poles and zeros shown below 
it. 


z+1 


FAG) perme a 
(z— 3) (2+ 4) 

The zeros are: {—1} 

The poles are: TŁ, —3} 


The Z-Plane 


Once the poles and zeros have been found for a given Z-Transform, they 
can be plotted onto the Z-Plane. The Z-plane is a complex plane with an 
imaginary and real axis referring to the complex-valued variable z. The 
position on the complex plane is given by re”? and the angle from the 
positive, real axis around the plane is denoted by 8. When mapping poles 
and zeros onto the plane, poles are denoted by an "x" and zeros by an "o". 
The below figure shows the Z-Plane, and examples of plotting zeros and 
poles onto the plane can be found in the following section. 
Z-Plane 

Im(z) 


re? 


Re(z) 


Examples of Pole/Zero Plots 


This section lists several examples of finding the poles and zeros of a 
transfer function and then plotting them onto the Z-Plane. 


Example: 
Simple Pole/Zero Plot 


The zeros are: {0} 
The poles are: { > = +} 
Pole/Zero Plot 

Im (z) 


Re(z) 


Using the zeros and 
poles found from the 
transfer function, the 
one zero is mapped to 

zero and the two 
poles are placed at 5 


3 
and — 7 
Example: 
Complex Pole/Zero Plot 
OVE (z — 7) (z+i) 


The zeros are: {i, —7} 


The poles are: {-1, = ee = = 


Fe 
Pole/Zero Plot 


Q 
Y 


Using the zeros and 
poles found from the 
transfer function, the 
zeros are mapped to 
+(i), and the poles 
are placed at —1, 


il ie 1 ; 
aN ens 1 


tole 


Example: 
Pole-Zero Cancellation 
An easy mistake to make with regards to poles and zeros is to think that a 


function like ee is the same as s + 3. In theory they are equivalent, 


as the pole and zero at s = 1 cancel each other out in what is known as 
pole-zero cancellation. However, think about what may happen if this 
were a transfer function of a system that was created with physical circuits. 
In this case, it is very unlikely that the pole and zero would remain in 
exactly the same place. A minor temperature change, for instance, could 
cause one of them to move just slightly. If this were to occur a tremendous 
amount of volatility is created in that area, since there is a change from 
infinity at the pole to zero at the zero in a very small range of signals. This 


is generally a very bad way to try to eliminate a pole. A much better way is 
to use control theory to move the pole to a better place. 


Note: 

Repeated Poles and Zeros 

It is possible to have more than one pole or zero at any given point. For 
instance, the discrete-time transfer function H(z) = z? will have two zeros 
at the origin and the continuous-time function H(s) = -= will have 25 
poles at the origin. 


MATLAB - If access to MATLAB is readily available, then you can use its 
functions to easily create pole/zero plots. Below is a short program that 
plots the poles and zeros from the above example onto the Z-Plane. 


% Set up vector for zeros 
zaJ 

% Set up vector for poles 

p = [-1 ; .5+.5j ; .5-.5j]; 


figure(1); 

zplane(z,p); 

title('Pole/Zero Plot for Complex 
Pole/Zero Plot Example'); 


Interactive Demonstration of Poles and Zeros 


:Pole-ZeroDrillDemo 


Interact (when online) with a Mathematica CDF 
demonstrating Pole/Zero Plots. To Download, right- 
click and save target as .cdf. 


Applications for pole-zero plots 


Stability and Control theory 


Now that we have found and plotted the poles and zeros, we must ask what 
it is that this plot gives us. Basically what we can gather from this is that the 
magnitude of the transfer function will be larger when it is closer to the 
poles and smaller when it is closer to the zeros. This provides us with a 
qualitative understanding of what the system does at various frequencies 
and is crucial to the discussion of stability. 


Pole/Zero Plots and the Region of Convergence 


The region of convergence (ROC) for X(z) in the complex Z-plane can be 
determined from the pole/zero plot. Although several regions of 
convergence may be possible, where each one corresponds to a different 
impulse response, there are some choices that are more practical. A ROC 
can be chosen to make the transfer function causal and/or stable depending 
on the pole/zero plot. 

Filter Properties from ROC 


e If the ROC extends outward from the outermost pole, then the system 
is causal. 
e If the ROC includes the unit circle, then the system is stable. 


Below is a pole/zero plot with a possible ROC of the Z-transform in the 
Simple Pole/Zero Plot discussed earlier. The shaded region indicates the 
ROC chosen for the filter. From this figure, we can see that the filter will be 
both causal and stable since the above listed conditions are both met. 


Example: 
pas Ss 
(z- 3) (2+ 3) 


Region of Convergence for the Pole/Zero Plot 


fal) 


The shaded area 
represents the chosen 
ROC for the transfer 

function. 


Frequency Response and Pole/Zero Plots 


The reason it is helpful to understand and create these pole/zero plots is due 
to their ability to help us easily design a filter. Based on the location of the 
poles and zeros, the magnitude response of the filter can be quickly 
understood. Also, by starting with the pole/zero plot, one can design a filter 
and obtain its transfer function very easily. 


Overview of Digital Filter Design 
Advantages of FIR filters 


1. Straight forward conceptually and simple to implement 

2. Can be implemented with fast convolution 

3. Always stable 

4. Relatively insensitive to quantization 

5. Can have linear phase (same time delay of all frequencies) 


Advantages of IIR filters 


1. Better for approximating analog systems 

2. For a given magnitude response specification, IIR filters often require 
much less computation than an equivalent FIR, particularly for narrow 
transition bands 


Both FIR and IIR filters are very important in applications. 
Generic Filter Design Procedure 


1. Choose a desired response, based on application requirements 
2. Choose a filter class 

3. Choose a quality measure 

4. Solve for the filter in class 2 optimizing criterion in 3 


Perspective on FIR filtering 


Most of the time, people do L optimal design, using the Parks-McClellan 
algorithm, This is probably the second most important technique in 
"classical" signal processing (after the Cooley-Tukey (radix-2) FFT). 


Most of the time, FIR filters are designed to have linear phase. The most 
important advantage of FIR filters over IIR filters is that they can have 
exactly linear phase. There are advanced design techniques for minimum- 
phase filters, constrained L optimal designs, etc. (see chapter 8 of text). 
However, if only the magnitude of the response is important, IIR filers 
usually require much fewer operations and are typically used, so the bulk of 
FIR filter design work has concentrated on linear phase designs. 


Linear Phase Filters 


In general, for =r < w < m 


Why is this important? A linear phase response gives the same time delay for ALL frequencies! (Remember the 
shift theorem.) This is very desirable in many applications, particularly when the appearance of the time-domain 
waveform is of interest, such as in an oscilloscope. (see [link]) 


linear phase filter non-linear phase filter 
x(t) y(t) 
LP-FIR IIR 


athe 


Restrictions on h(n) to get linear phase 
Equation: 


Hw) = Yro hne t 
h(0) + h(1)e~) + h(2)e6) +... + A(M — 1)e~(-D) 


M- M-1 M-1 


en (iw) (r(o)e™ 7 +... +A(M — 1)e- 6) 


II 


= e77) ((h(0) +h(M — 1)) cos( 4w) + (A(1) +h(M — 2)) cos( 42w) +... + i (A(0) sinl 


2 


For linear phase, we require the right side of [link] to be e~ 0) (real,positive function of œ). For 09 = 0, we 
thus require 


h(0) + h(M — 1) = real number 
h(0) — h(M — 1) = pure imaginary number 
h(1) + h(M — 2) = pure real number 


h(1) — h(M — 2) = pure imaginary number 


Thus h(k) = h*(M — 1 — k) is a necessary condition for the right side of [link] to be real valued, for 9 = 0. 
For 09 = 4, or e (%) — —i, we require 
h(0) + h(M — 1) = pure imaginary 
h(0) — h(M — 1) = pure real number 


= h(k) = — (r -e k)) 


Usually, one is interested in filters with real-valued coefficients, or see [link] and [link]. 


M even ; 


= M-I 


integer fraction 


2 2 


0o = 0 (Symmetric Filters). 
h(k) = h(M —1-k). 


Meven ; 


99 = + (Anti-Symmetric Filters). 
h(k) = —h(M — 1 — k). 


Filter design techniques are usually slightly different for each of these four different filter types. We will study the 
most common case, symmetric-odd length, in detail, and often leave the others for homework or tests or for when 
one encounters them in practice. Even-symmetric filters are often used; the anti-symmetric filters are rarely used in 
practice, except for special classes of filters, like differentiators or Hilbert transformers, in which the desired 
response is anti-symmetric. 


So far, we have satisfied the condition that H(w) = A(w)e~ (#90) e— (ie) where A(w) is real-valued. However, 
we have not assured that A(w) is non-negative. In general, this makes the design techniques much more difficult, 
so most FIR filter design methods actually design filters with Generalized Linear Phase: 

H(w) = Alwe D), where A(w) is real-valued, but possible negative. A(w) is called the amplitude of the 
frequency response. 


Note: A(w) usually goes negative only in the stopband, and the stopband phase response is generally 
unimportant. 


lifzx>0 


Note: |H(w)| = +(A(w)) = A(w)e~ ("2 (1-sien(A)))) where sign (x) = E 
-l if z 


Example: 
Lowpass Filter 
Desired |H(%)| 


4 
| 


m =w. 


Desired ZH(q) 


Actual |H(o)| 


A(w) goes negative. 


Actual ZH(o) 


27 phase jumps due to 
periodicity of phase. 7 
phase jumps due to 
sign change in A(w). 


Time-delay introduces generalized linear phase. 


Note: For odd-length FIR filters, a linear-phase design procedure is equivalent to a zero-phase design procedure 
followed by an M -sample delay of the impulse response. For even-length filters, the delay is non-integer, and 
the linear phase must be incorporated directly in the desired response! 


Window Design Method 


The truncate-and-delay design procedure is the simplest and most obvious FIR design procedure. 
Exercise: 


Problem: Is it any Good? 
Solution: 


Yes; in fact it's optimal! (in a certain sense) 


L2 optimization criterion 


find Vn,0 < n < M —1: (A[n]), maximizing the energy difference between the desired response and the actual 
response: i.e., find 


min {ln} [7 (Rale) - #4(w))) aw} 


=T 


by Parseval's relationship 
Equation: 


min pin {aln JE (\Ha(w) — H(w)|)? d w) 


II 


2r Dr-o (Ihaln] — hin)” 


= 2r (Ezo (lhala) — Mrl)? + ONG! (hala — hnll)? + 
Since Math input error this becomes 
T —1 M- f lore) 
min jn] {ata f (|Ha(w) — H(w)|)? av} = SO (halli)? + "+ X (\haln] 
=R h=—oo 2, n=M 


Note: h[n] has no influence on the first and last sums. 


The best we can do is let 


h a if0<n<M-1 


if else 


a Bi if 0< n(M- 1) 


if else 


is optimal in a least-total-sqaured-error ( L2, or energy) sense! 
Exercise: 


Problem: Why, then, is this design often considered undersirable? 


Solution: 


Gibbs Phenomenon 


A(w), small M A(w), large M 


_-~0.11 + 1.0 


™~_0.11 


For desired spectra with discontinuities, the least-square designs are poor in a minimax (worst-case, or Loo) error 
sense. 


Window Design Method 


Apply a more gradual truncation to reduce "ringing" (Gibb's Phenomenon) 
Yn0<n<M-—t1h|n|=hg[n|w[n] : (n0<n<M-—1h[n|=ha|n|w{[n]) 


Note: H(w) = Ha(w)*W (w) 


The window design procedure (except for the boxcar window) is ad-hoc and not optimal in any usual sense. 
However, it is very simple, so it is sometimes used for "quick-and-dirty" designs of if the error criterion is itself 
heurisitic. 


Frequency Sampling Design Method for FIR filters 


Given a desired frequency response, the frequency sampling design method designs a filter 
with a frequency response exactly equal to the desired response at a particular set of 
frequencies wy. 
Equation: 

Procedure 


n=0 


M-1 
Vk, k= [o, | eee aN = 1] : (mw = 5 henee] 


Note: Desired Response must incluce linear phase shift (if linear phase is desired) 


Exercise: 


Problem: What is H4(w) for an ideal lowpass filter, cotoff at we? 


Solution: 


e (7) if — We S w < we 
0 if (r <w < —w.) V (lwe <w <T) 


Note: This set of linear equations can be written in matrix form 


Equation: 


Equation: 


Ha(wo) e~ (iw00) e~ (wel) e- (iwo(M-1)) h(0) 
Ha(w1) e~ (110) e~ (iwil) e~ (iw (M-1)) h(1) 
Ha(wn-1) e (twu-10) e~ (iwm) e~ (iwu-1(M—1)) h(M — 1) 
or 
Ha = Wh 
So 
Equation: 
h=W'Ha 


Note: W is a square matrix for N = M, and invertible as long as w; 4 wj + 2rl, i Fj 


Important Special Case 


What if the frequencies are equally spaced between 0 and 27, i.e. wk = Zat +a 


Then 
= - Inkn . Mal . - Inkn 
Ha(we) = X A(nje eC) = Se (hinete Jel i) = DFT! 
n=0 n=0 
sO 
. 1 M - 2mnk 
h(n)e 7 M 2 Halwpje ™ 
or 
eton Mal - 2mnk . 
hin] = 5 XO Halwgle = ce" IDFT [Halwa] 
k=0 


Important Special Case #2 


h|n] symmetric, linear phase, and has real coefficients. Since h[n] = h|M — n], there are 
only x degrees of freedom, and only M linear equations are required. 
Equation: 


Hwy] — Dan h[nje~ sr) 
- o hfn] (e7 Gwen) 4 e~liwi(M-n-1))) if M even 
= re hin] (e7 (iwen) 4 e A (n[ Je ~ (iwr "7> °) if M odd 


~ (iene Not, 7 a hfn] cos(w;, (44+ — n)) if M even 


e (wn ayia h[n| cos(w;, (“44+ —n)) + h[4+*] if M odd 
Removing linear phase from both sides yields 


Eo hin] cos(w, (44+ —n)) if M even 
S P h|n] cos(w, (= —n)) + h| 44] if M odd 


A(wk) = 


Due to symmetry of response for real coefficients, only — w on w E [0, 7) need be 
specified, with the frequencies —w, thereby being implicitly defined also. Thus we have 


x real-valued simultaneous linear equations to solve for h|n]. 


Special Case 2a 


h|n] symmetric, odd length, linear phase, real coefficients, and wg equally spaced: 
Vk,O<k<M-1: (wk = ZE) 
Equation: 

hin] = IDFT [Ha(wrx)| 


= Dy A(wee Or) Mo eit 


= Ay) Ake Gr (ms) 


To yield real coefficients, A(w) mus be symmetric 
(A(w) = A(—w)) = (A[k] = A[M — kl) 


Equation: 


ial ( A(0) +5 Alk] (cit 4) $ Gagn 
(A(0) +25,2 Alk] cos(2 (n — 2))) 


(A(0) +25 AlK](-1)* cos(2## (n+ 4))) 


| 
s sj sl- 


Simlar equations exist for even lengths, anti-symmetric, and a = - filter forms. 


Comments on frequency-sampled design 


This method is simple conceptually and very efficient for equally spaced samples, since 
h|n] can be computed using the IDFT. 


H(w) for a frequency sampled design goes exactly through the sample points, but it may 
be very far off from the desired response for w Æ wy. This is the main problem with 
frequency sampled design. 


Possible solution to this problem: specify more frequency samples than degrees of 
freedom, and minimize the total error in the frequency response at all of these samples. 


Extended frequency sample design 


For the samples H(wg) where 0 < k < M — 1 and N > M, find h|n], where 
0<n< M -1 minimizing || Hy(w,) — H(w,) || 


For || l ||, norm, this becomes a linear programming problem (standard packages 
availble!) 


Here we will consider the || / ||, norm. 


To minimize the || l ||, norm; that is, Âg |Ha(we) — H(wx)|, we have an 
overdetermined set of linear equations: 


eo (iwo) e- (ivo(M-1)) Ha(wo) 
Ha(w1) 
. . x h T 7 
—(iwy_10) —(iwn—1(M—1)) i 
e wee, E Ha(wn-1) 


or 


Wh = Ha 


— = Lis 
The minimum error norm solution is well known to be h = (ww) W Hu; 


SAAN = 
(ww) W is well known as the pseudo-inverse matrix. 


Note:Extended frequency sampled design discourages radical behavior of the frequency 
response between samples for sufficiently closely spaced samples. However, the actual 
frequency response may no longer pass exactly through any of the Hy(w,). 


Parks-McClellan FIR Filter Design 


The approximation tolerances for a filter are very often given in terms of the maximum, or 
worst-case, deviation within frequency bands. For example, we might wish a lowpass filter in a 
(16-bit) CD player to have no more than 5 bit deviation in the pass and stop bands. 


1- -+ 2 (Fig) Hie if lel <w 
Ha) =| ar = | (w)|= 517 lw] < wp 


sir > |H(w)| if w, < |w| < r 


The Parks-McClellan filter design method efficiently designs linear-phase FIR filters that are 
optimal in terms of worst-case (minimax) error. Typically, we would like to have the shortest- 
length filter achieving these specifications. Figure [link] illustrates the amplitude frequency 
response of such a filter. 


The black boxes on the left and right are the passbands, the 

black boxes in the middle represent the stop band, and the 

space between the boxes are the transition bands. Note that 
overshoots may be allowed in the transition bands. 


Exercise: 


Problem: Must there be a transition band? 


Solution: 


Yes, when the desired response is discontinuous. Since the frequency response of a finite- 
length filter must be continuous, without a transition band the worst-case error could be no 
less than half the discontinuity. 


Formal Statement of the L-œ (Minimax) Design Problem 


For a given filter length (M) and type (odd length, symmetric, linear phase, for example), and a 
relative error weighting function W(w), find the filter coefficients minimizing the maximum 
error 


argminargmax |E(w)| =argmin || E(w) || ,, 
h weF h 


where 
E(w) = W(w) (Haw) — H(w)) 


and F is a compact subset of w € [0, 7] (i.e., all w in the passbands and stop bands). 


Note: Typically, we would often rather specify || E(w) || < ô and minimize over M and h; 
however, the design techniques minimize 6 for a given M. One then repeats the design 
procedure for different M until the minimum M satisfying the requirements is found. 


We will discuss in detail the design only of odd-length symmetric linear-phase FIR filters. 
Even-length and anti-symmetric linear phase FIR filters are essentially the same except for a 
slightly different implicit weighting function. For arbitrary phase, exactly optimal design 
procedures have only recently been developed (1990). 


Outline of L-œ Filter Design 


The Parks-McClellan method adopts an indirect method for finding the minimax-optimal filter 
coefficients. 


1. Using results from Approximation Theory, simple conditions for determining whether a 
given filter is L° (minimax) optimal are found. 

2. An iterative method for finding a filter which satisfies these conditions (and which is thus 
optimal) is developed. 


That is, the L filter design problem is actually solved indirectly. 


Conditions for L- Optimality of a Linear-phase FIR Filter 


All conditions are based on Chebyshev's "Alternation Theorem," a mathematical fact from 
polynomial approximation theory. 


Alternation Theorem 


Let F be a compact subset on the real axis x, and let P(x) be and Lth-order polynomial 


L 
P(x) = > apr” 
k=0 


Also, let D(x) be a desired function of x that is continuous on F, and W (æ) a positive, 
continuous weighting function on F. Define the error E(x) on F as 


and 


|| E(x) ||. =argmax |E(z)| 
xzEF 


A necessary and sufficient condition that P(x) is the unique Lth-order polynomial minimizing 
|| E(x) ||,, is that E(x) exhibits at least L + 2 "alternations;" that is, there must exist at least 
L + 2 values of z, zp € F, k = [0,1,..., L + 1], such that £o < z1 <... < @p42 and such 
that (etx) = -E(era) = +(\| E len) 

Exercise: 


Problem: What does this have to do with linear-phase filter design? 
Solution: 


It's the same problem! To show that, consider an odd-length, symmetric linear phase filter. 
Equation: 


Ho) = SMG hne t 


= e (u7) (a( 25) RN h( #7 —n) cos(wn) 
Equation: 
A(w) = hA(L) +2 2 h(L — n) cos(wn) 


Where L = M=, 


Using trigonometric identities (such as 
cos(na) = 2cos((n — 1)a) cos(a) — cos((n — 2)a)), we can rewrite A(w) as 


A(w) = hA(L) 4+ 2 >», h(L — n) cos(wn) ay a, cost (w) 


where the a, are related to the h(n) by a linear transformation. Now, let x = cos(w). This 
is a one-to-one mapping from æ € [—1, 1] onto w € [0, z]. Thus A(w) is an Lth-order 
polynomial in z = cos(w)! 


Note:The alternation theorem holds for the L filter design problem, too! 


Therefore, to determine whether or not a length-M, odd-length, symmetric linear-phase 
filter is optimal in an L® sense, simply count the alternations in 

E(w) = W (w) (Aa(w) — A(w)) in the pass and stop bands. If there are L + 2 = “43 or 
more alternations, h(n), 0 < n < M — 1 is the optimal filter! 


Optimality Conditions for Even-length Symmetric Linear-phase Filters 


For M even, 
L 
A(w) = A(L E 
(w) 3 ( Jeos(w(n+3])) 
where L = _ — 1 Using the trigonometric identity 


cos(a + 8) = cos(a — 8) + 2 cos(œ) cos(f) to pull out the $ term and then using the other 
trig identities, it can be shown that A(w) can be written as 


A(w) = cos( = ) arcos w) 


Again, this is a polynomial in z = cos(w), except for a weighting function out in front. 
Equation: 


E(w) = W(w) (Alw) — Al) 
= W(w) (Aa(w) — cos(#) P(w)) 


W(w) cos(#) (Ase — P(w)) 


vole 


which implies 
Equation: 


where 


and 


eal cos( +(cos(z))*) 


Again, this is a polynomial approximation problem, so the alternation theorem holds. If E(w) 
has at least L + 2 = 4 + 1 alternations, the even-length symmetric filter is optimal in an L™ 
sense. 


The prototypical filter design problem: 
1 if jw) < wp 
W5 | & if feel < lal 


See [link]. 


L-œ Optimal Lowpass Filter Design Lemma 


1. The maximum possible number of alternations for a lowpass filter is L + 3: The proof is 
that the extrema of a polynomial occur only where the derivative is zero: ae = 0. Since 
P'(a) is an (L — 1)th-order polynomial, it can have at most L — 1 zeros. However, the 
mapping x = cos(w) implies that oA) = 0 at w = 0 and w = 7, for two more possible 
alternation points. Finally, the band edges can also be alternations, for a total of 
L — 1 +2 +2 = L + 3 possible alternations. 

2. There must be an alternation at either w = 0 or w = 7. 

3. Alternations must occur at wp and ws. See [link]. 


4. The filter must be equiripple except at possibly w = 0 or w = v. Again see [link]. 


Note:The alternation theorem doesn't directly suggest a method for computing the optimal 
filter. It simply tells us how to recognize that a filter is optimal, or isn't optimal. What we need 
is an intelligent way of guessing the optimal filter coefficients. 


In matrix form, these L + 2 simultaneous equations become 


1  cos(wo) cos(2wo) ..— cos(Lwo) Wa} A 
. 220) r a a h(L) a(wo) 
nl) | 
h(0) 
ee 9 Anton 
1 cos(wz41) cos(2wz41) ... cos(Lwy+1) Wwa) 
or 


rÇ) -4 


So, for the given set of L + 2 extremal frequencies, we can solve for h and ô via 

(hô)? = W-1A,. Using the FFT, we can compute A(w) of h(n), on a dense set of 
frequencies. If the old wx are, in fact the extremal locations of A(w), then the alternation 
theorem is satisfied and h(n) is optimal. If not, repeat the process with the new extremal 
locations. 


Computational Cost 


O(L’) for the matrix inverse and N log, N for the FFT (N > 32L, typically), per iteration! 
This method is expensive computationally due to the matrix inverse. 
A more efficient variation of this method was developed by Parks and McClellan (1972), and is 


based on the Remez exchange algorithm. To understand the Remez exchange algorithm, we 
first need to understand Lagrange Interpoloation. 


Now A(w) is an Lth-order polynomial in z = cos(w), so Lagrange interpolation can be used to 
exactly compute A(w) from L + 1 samples of A(w,), k = (0, 1, 2,..., L]. 


Thus, given a set of extremal frequencies and knowing ô, samples of the amplitude response 
A(w) can be computed directly from the 
Equation: 


without solving for the filter coefficients! 
This leads to computational savings! 


Note that [link] is a set of L + 2 simultaneous equations, which can be solved for 6 to obtain 
(Rabiner, 1975) 


Equation: 
5- kao YkAalwr) 
L+1 (1O 
where 
L+1 1 
L sith cos(wk) — cos(w;) 


The result is the Parks-McClellan FIR filter design method, which is simply an application of 
the Remez exchange algorithm to the filter design problem. See [link]. 


Tnitial guess of L+2 
extremal fequencies 


Compute ô using 


the equation given 


Using Lagrange interpolation 
compute dense set of samples 
(typically, 16*L) of Af) over the 
pass and stop bands 


Determine new L+2 
largest extrema 


es 
Compute h(n) 


Alternation Theorem satisfied? 


The initial guess of extremal frequencies is 
usually equally spaced in the band. 
Computing 6 costs O(L’). Using Lagrange 
interpolation costs O(16LL) ~ O(16L7). 
Computing h(n) costs O(L*), but it is only 


done once! 


The cost per iteration is O(16L7) , aS Opposed to O(L*); much more efficient for large L. Can 
also interpolate to DFT sample frequencies, take inverse FFT to get corresponding filter 
coefficients, and zeropad and take longer FFT to efficiently interpolate. 


Overview of IIR Filter Design 


IIR Filter 


y(n) = — X axy(n — k) + X | be(n — k) 
k=1 k=0 


H(2) by + by2! + bez? bee + bye ~” 
D Se 
1 +aiz™! + asz™? +... + amz ™ 


IIR Filter Design Problem 


Choose {a;}, {b;} to best approximate some desired |Ha(w)| or, 
(occasionally), Hg(w). 


As before, different design techniques will be developed for different 
approximation criteria. 


Outline of IIR Filter Design Material 


e Bilinear Transform Maps || L ||, optimal (and other) analog filter 
designs to || L ||, optimal digital IIR filter designs. 

e Prony's Method Quasi-|| L ||, optimal method for time-domain 
fitting of a desired impulse response (ad hoc). 

¢ Lp Optimal Design || L ||, optimal filter design (1 < p < oo) using 
non-linear optimization techniques. 


Comments on IIR Filter Design Methods 


The bilinear transform method is used to design "typical" || L || ,,, 
magnitude optimal filters. The || £ || p Optimization procedures are used to 
design filters for which classical analog prototype solutions don't exist. The 
program by Deczky (DSP Programs Book, IEEE Press) is widely used. 
Prony/Linear Prediction techniques are used often to obtain initial guesses, 


and are almost exclusively used in data modeling, system identification, and 
most applications involving the fitting of real data (for example, the 
impulse response of an unknown filter). 


Prototype Analog Filter Design 


Analog Filter Design 


Laplace transform: 
HG / ha(t)e" dt 


Note that the continuous-time Fourier transform is H (iA) (the Laplace transform 
evaluated on the imaginary axis). 


Since the early 1900's, there has been a lot of research on designing analog filters of the 
form 


bo + bis 4 bas? Fose bus 
i ETTET EE 


- a1s + a28? +... + ams 


A causal IIR filter cannot have linear phase (no possible symmetry point), and design 
work for analog filters has concentrated on designing filters with equiriplle (|| £ ||.) 
magnitude responses. These design problems have been solved. We will not concern 
ourselves here with the design of the analog prototype filters, only with how these 
designs are mapped to discrete-time while preserving optimality. 


An analog filter with real coefficients must have a magnitude response of the form 
2 
(|H(A)|)° = B(A’) 
Equation: 
HOH = by +byiA+by (id)? +b, (id)? +... H(id) 


1+ayiA+a2(édr)?+... 


bobo? +b4A*+...+4A(b1—b3A7 +b5A4 +...) bo —b2A?2+b4A44... 474A (by —b3A2+b5A4+...) 
1—a2d?+a4A44...4¢\(a1—a3A2+45\4+...) 1—a2d?+a4A44+...+iA(a1—a3A2+45A4+...) 
(by —ba 2+ b4d4 +...) +A? (b1—b3A2+b5A4+...) 
(1—apd?-+a4M4+...)?+A2(a1—a3\2+a5A4+4...)” 


B(X?) 


Let s = 2A, note that the poles and zeros of B ( —s*) are symmetric around both the 


real and imaginary axes: that is, a pole at pı implies poles at p1, pı, —pi, and —p1, as 
seen in [link]. 


Recall that an analog filter is stable and causal if all the poles are in the left half-plane, 
LHP, and is minimum phase if all zeros and poles are in the LHP. 


s = id: B(A?) = B(-s’) = H(s)H(—s) = H (iA)H (— (iA)) = H (iA) (iA) we 
can factor B(—s*) into H(s)H(—s), where H(s) has the left half plane poles and 
zeros, and H(—s) has the RHP poles and zeros. 


(|H(s)|)* = H(s)H(-—s) for s = id, so H(s) has the magnitude response B(’). 
The trick to analog filter design is to design a good B (a°), then factor this to obtain a 
filter with that magnitude response. 


The traditional analog filter designs all take the form B(A?) = (|H(A)|)? = 


1 
1+F (A?) ? 
where F is a rational function in Aĉ. 


Example: 
Be) 2a 
2 o 2-8% | (v2—s) (v2+s) 
Be as pata (s +a) (s—a)(s+a)(s—a) 
where a = =" 


Note:Roots of 1 + s™ are N points equally spaced around the unit circle ((link]). 


Take H(s) = LHP factors: 


—  v2+s — v2+s 
HONS cea) p CORSE 


Traditional Filter Designs 


Butterworth 
1 
BA = se 
0") 1+ AM 


Note:Remember this for homework and rest problems! 


"Maximally smooth" at A = 0 and A = oo (maximum possible number of zero 
derivatives). [link]. 


B(X?) = (H(X)? 


Chebyshev 


1 
1 +462Cy?(X) 


B(A’) 


where C'm’ (A) is an M*} order Chebyshev polynomial. [link]. 


Inverse Chebyshev 


[link]. 


Elliptic Function Filter (Cauer Filter) 


1 
B(X’) = 
a3 1 +e?Jm (A) 


where Jm is the "Jacobi Elliptic Function." [link]. 


The Cauer filter is || Z ||, optimum in the sense that for a given M, ôp, ôs, and Ap, the 
transition bandwidth is smallest. 


That is, it is || Z ||, optimal. 


IIR Digital Filter Design via the Bilinear Transform 


A bilinear transform maps an analog filter H,(s) to a discrete-time filter 
H(z) of the same order. 


If only we could somehow map these optimal analog filter designs to the 
digital world while preserving the magnitude response characteristics, we 
could make use of the already-existing body of knowledge concerning 
optimal analog filter design. 


Bilinear Transformation 


The Bilinear Transform is a nonlinear C — C mapping that maps a 
function of the complex variable s to a function of a complex variable z. 
This map has the property that the LHP in s (A(s) < 0) maps to the 
interior of the unit circle in z, and the iA = s axis maps to the unit circle e’ 
in z. 


W 


Bilinear transform: 


p= 
=Q 
Za 
| 
H(z) =.) s = 
(2) (: j IFI ) 
ee w_y — _ (e*-1) (e~ ™)41) _  sinWw) _. 
Note: 0 = Cle = (41) (e CF1) = 2+2 cos(w) = iatan($), so 


A= atan(*), w = 2 arctan ( 2 ). [link]. 


2 a 


The magnitude response doesn't change in the mapping from A to w, it is 
simply warped nonlinearly according to H(w) = Ha (a tan ( F ) ) , [link]. 


The first image implies the second one. 


H) 


Note:This mapping preserves || L ||,, errors in (warped) frequency bands. 
Thus optimal Cauer (|| L£ ||.) filters in the analog realm can be mapped to 
|| Z ||,, optimal discrete-time IIR filters using the bilinear transform! This 
is how IIR filters with || L ||, optimal magnitude responses are designed. 


Note:The parameter a provides one degree of freedom which can be used 
to map a single Ag to any desired wo: 


NI = atan( =>) 


or 


Ao 


tan (<) 


a= 


This can be used, for example, to map the pass-band edge of a lowpass 
analog prototype filter to any desired pass-band edge in w. Often, analog 
prototype filters will be designed with A = 1 as a band edge, and a will be 
used to locate the band edge in w. Thus an M*® order optimal lowpass 
analog filter prototype can be used to design any M* order discrete-time 
lowpass IIR filter with the same ripple specifications. 


Prewarping 


Given specifications on the frequency response of an IIR filter to be 
designed, map these to specifications in the analog frequency domain which 
are equivalent. Then a satisfactory analog prototype can be designed which, 
when transformed to discrete-time using the bilinear transformation, will 
meet the specifications. 


Example: 

The goal is to design a high-pass filter, ws = Ws, Wp = Wp, 05 = Os, Ôp = Op 
; pick up some @ = ao. In [link] the 6; remain the same and the band edges 
are mapped by A; = ap tan(+). 


Where A; = ay tan (+) and Ap = ag tan(=). 


Ho) 


Impulse-Invariant Design 


Pre-classical, adhoc-but-easy method of converting an analog prototype filter 
to a digital IIR filter. Does not preserve any optimality. 


Impulse invariance means that digital filter impulse response exactly equals 
samples of the analog prototype impulse response: 


Vn : (h(n) = ha(nT)) 
How is this done? 


The impulse response of a causal, stable analog filter is simply a sum of 
decaying exponentials: 


bp + bis + bes? +...+ bps? Ay A Á> A, 


1+ ais + a28? + ... + aps? S— 8 8 — 8» S—S8, 


H,(s) 
which implies 
ha(t) = (Aie + Aves" +... A,e*”’)u(t) 
For impulse invariance, we desire 
hin) hen) = (Aef kia ai Ape" )u(n) 


Since 
Aze*?)"u(n) = 


where |z| > |e**”|, and 


where |z| >max; {k, |e? |}. 


This technique is used occasionally in digital simulations of analog filters. 


Exercise: 
Problem: 
What is the main problem/drawback with this design technique? 
Solution: 
Since it samples the non-bandlimited impulse response of the analog 
prototype filter, the frequency response aliases. This distorts the original 


analog frequency and destroys any optimal frequency properties in the 
resulting digital filter. 


Digital-to-Digital Frequency Transformations 


Given a prototype digital filter design, transformations similar to the 
bilinear transform can also be developed. 


Requirements on such a mapping gis g(z"): 


1. points inside the unit circle stay inside the unit circle (condition to 
preserve stability) 
2. unit circle is mapped to itself (preserves frequency response) 


This condition implies that e~ 1) = g(e~()) = |g(w)|e*<(9)) requires 
that | gle) | = 1 on the unit circle! 


Thus we require an all-pass transformation: 


P —1 
Siy z — Qk 
g(z ) “on 1 — azz-! 


where |ax| < 1, which is required to satisfy this condition. 


Example: 
Lowpass-to-Lowpass 


wf a — a 
ZI — o e 
l= az! 


which maps original filter with a cutoff at we to a new filter with cutoff oe 


sin(+ (we — w,)) 


gj sin(4 (we + w,,)) 


Example: 
Lowpass-to-Highpass 


a BEG 


D DSO 
; 1+az! 


which maps original filter with a cutoff at we to a frequency reversed filter 
with cutoff wi, 


cos ( = ee we) 


2 
cos(+ (we + wt)) 


(Interesting and occasionally useful!) 


Prony's Method 
Prony's Method is a quasi-least-squares time-domain IIR filter design method. 


First, assume H(z) is an "all-pole" system: 
Equation: 


bo 


72 —— 
1+ Dp ane 


and 


M 
— So agh(n — k) + boô(n) 
k=1 


where h(n) = 0, n < 0 for a causal system. 
Note:For h = 0, h(0) = bo. 


Let's attempt to fit a desired impulse response (let it be causal, although one 
can extend this technique when it isn't) hg(n). 


A true least-squares solution would attempt to minimize 


[0.0] 


= $ (lha(n) — h(n)I)° 


n=0 


where H(z) takes the form in [link]. This is a difficult non-linear 
optimization problem which is known to be plagued by local minima in the 
error surface. So instead of solving this difficult non-linear problem, we solve 
the deterministic linear prediction problem, which is related to, but not the 
same as, the true least-squares optimization. 


The deterministic linear prediction problem is a linear least-squares 
optimization, which is easy to solve, but it minimizes the prediction error, 


not the (|desired — actual|)* response error. 


Notice that for n > 0, with the all-pole filter 
Equation: 


M 
h(n) = — ` akh(n — k) 
k=1 


the right hand side of this equation is a linear predictor of h(n) in terms of 
the M previous samples of h(n). 


For the desired reponse ha(n), one can choose the recursive filter coefficients 
a, to minimize the squared prediction error 
2 


23 


n=1 


M 
ha(n) + X agha(n — k) 
k=1 


where, in practice, the oo is replaced by an NV. 


In matrix form, that's 


ha(0) 0 0 i ha(1) 

ha(1) ha(0) 0 as ha(2) 

ha(N — 1) ha(N — 2) ha(N—M)} \am ha(N) 
Hya = —hg 


The optimal solution is 


ap = — (HaHa) Haha) 


Now suppose H(z) is an M *-order IIR (ARMA) system, 


or 
Equation: 


h(n) 


M = 
H(z) — > k0 bkz k 
= eames = 
1+ Dd p- Oh? 


— Eka arh(n — k) + yan b;,6(n — k) 
-EM agh(n — k) + bn if 0<n<M 
-YM arh(n— k) if n> M 


For n > M, this is just like the all-pole case, so we can solve for the best 
predictor coefficients as before: 


ha(M) ha(M-1) ... ha(1) ay ha(M + 1) 
ha(M + 1) ha(M) ce ha(2) ao ha(M Eg 2) 
hal N =ņ1) hal N — 2) . ha(N — M) a ha(N) 

Hua = ha 
and 


ae! aud 
Qopt = (R) Ha) Haha 


Having determined the a's, we can use them in [link] to obtain the 6,,'s: 


M 
bn = X agha(n — k) 
k=1 


where ha(n — k) = 0 forn — k < 0. 


For N = 2M, fi q is square, and we can solve exactly for the a;'s with no 
error. The b;'s are also chosen such that there is no error in the first M + 1 
samples of h(n). Thus for N = 2M, the first 2M + 1 points of h(n) exactly 
equal hq(n). This is called Prony's Method. Baron de Prony invented this in 
1795. 


For N > 2M, ha(n) = h(n) for 0 < n < M, the prediction error is 
minimized for M + 1 < n < N, and whatever for n > N + 1. This is called 
the Extended Prony Method. 


One might prefer a method which tries to minimize an overall error with the 
numerator coefficients, rather than just using them to exactly fit hg(0) to 


ha(M). 


Shank's Method 


1. Assume an all-pole model and fit hg(n) by minimizing the prediction 
errorl<n< WN. 

2. Compute v(n), the impulse response of this all-pole filter. 

3. Design an all-zero (MA, FIR) filter which fits v(n)*h,(n) ~ ha(n) 
optimally in a least-squares sense ((link]). 


The final IIR filter is the cascade of the all-pole and all-zero filter. 


This is is solved by 


N 
miny, bk, ` ( ha(n) — 


n=0 


or in matrix form 


v(0) 0 0 0 bo ha(0) 
v(1)  x(0) 0 0 bi ha(1) 
v(2) —v(1) v(0) 0 bo | ~ | Ag(2) 
»(N) »(N — 1) o(N — 2) N v(N 7 M) i ha( N) 


Which has solution: 
bopt = (VEV) ‘VER 


Notice that none of these methods solve the true least-squares problem: 


minab feb (\ha(n j- rin? | 
n=0 


which is a difficult non-linear optimization problem. The true least-squares 
problem can be written as: 


oo 
ming, Qa,;5, p, ` ( 


n=0 


M 
ha(n) — ys aef” 


i=1 


since the impulse response of an IIR filter is a sum of exponentials, and non- 
linear optimization is then used to solve for the a; and ĝ;. 


Linear Prediction 


Recall that for the all-pole design problem, we had the overdetermined set of linear equations: 


ha(0) 0 = 0 ay ha(1) 
ha(1) ha(0) e 0 az ha(2) 
ha(N -1) ha(N = 2) . ha(N — M) M ha(N) 


with solution a = (H Ha) Aha 


Let's look more closely at H eH a = R. rij is related to the correlation of hg with itself: 
N-max{i,j} 
rg= >> halk)ha(k+ |é — jl) 
k=0 


Note also that: 


Hatha = ra(3) 


where 
N-i 
ra(t) = > ha(n)ha(n +i) 
n=0 
so this takes the form aopt = — (R¥ ra), or Ra = ~r, where Ris M x M,ais M x 1, and r is also M x 1. 


Except for the changing endpoints of the sum, r;; ~ r(i — j) = r(j — i). If we tweak the problem slightly to 
make rj; = r(i — j), we get: 


r(0) r(1)-r(2) =- r(M—1) ae (1) 
r(1)  r(0) r(1) az r(2) 
r(2) (1) r(0) as | =—| 70) 
r(M — 1) r(0) SM aM) 


The matrix R is Toeplitz (diagonal elements equal), and a can be solved for with O(M 2) computations using 
Levinson's recursion. 


Statistical Linear Prediction 
Used very often for forecasting (e.g. stock market). 


Given a time-series y(n), assumed to be produced by an auto-regressive (AR) (all-pole) system: 


M 
y(n) = — X agy(n — k) + u(n) 
k=1 


where u(n) is a white Gaussian noise sequence which is stationary and has zero mean. 


To determine the model parameters {ap} minimizing the variance of the prediction error, we seek 
Equation: 


ming, {an E|(y(n) +E annin —¥)) |b = mina fan Bpi) +258, ayina- 


= min, far, Ely?(n)] +2014, arElyln)yln — k) +$ 


Note:The mean of y(n) is zero. 


Equation: 


e? = r(0)+2(r(1) r(2) r(3) r(M))| %3 | +(aı1 az az am) r(2) r(1) r(0) 
2 r(M — 1) 
Equation: 
2 
da = 2r + 2Ra 
ða 
Setting [link] equal to zero yields: Ra = —r These are called the Yule-Walker equations. In practice, given 
samples of a sequence y(n), we estimate r(n) as 
~ 1 N-n 
S X u(n)y(n + k) ~ Ely(k)y(n + k)] 
k=0 


which is extremely similar to the deterministic least-squares technique. 


DFT Definition and Properties 


DFT 


The discrete Fourier transform (DFT) is the primary transform used for numerical 
computation in digital signal processing. It is very widely used for spectrum analysis, 
fast convolution, and many other applications. The DFT transforms NV discrete-time 
samples to the same number of discrete frequency samples, and is defined as 
Equation: 


N-1 


X(k) = ` r(nje C>) 


n= 


The DFT is widely used in part because it can be computed very efficiently using fast 
Fourier transform (FFT) algorithms. 


IDFT 
The inverse DFT (IDFT) transforms N discrete-frequency samples to the same 


number of discrete-time samples. The IDFT has a form very similar to the DFT, 
Equation: 


- Iank 
N 


1 N 
x(n) = W D X(k)e 


and can thus also be computed efficiently using FFTs. 
DFT and IDFT properties 


Periodicity 


Due to the N-sample periodicity of the complex exponential basis functions et in 
the DFT and IDFT, the resulting transforms are also periodic with N samples. 


X(k + N) = X(k) 
x(n) = x(n + N) 


Circular Shift 


A shift in time corresponds to a phase shift that is linear in frequency. Because of the 
periodicity induced by the DFT and IDFT, the shift is circular, or modulo N samples. 


j Zahm ) 


a((n—m) mod N) X(kje C" 


The modulus operator p mod N means the remainder of p when divided by N. For 
example, 


9 mod 5 = 4 
and 


—]1 mod 5 = 4 


Time Reversal 
z((—n) mod N) = z((N — n) mod N) X((N — k) mod N) = X((—k) mod N) 


Note: time-reversal maps 0 0,1 N -— 1,2 N —2, etc. as illustrated in the figure 
below. 


Original signal Time-reversed 


Illustration of circular time-reversal 


Complex Conjugate 


x(n) X((—k) mod N) 


Circular Convolution Property 


Circular convolution is defined as 


Nei 
r(n)*h(n) = ` z(m)a((n — m) mod N) 
m=0 


Circular convolution of two discrete-time signals corresponds to multiplication of their 
DFTs: 


x(n)*h(n) X(k)H(k) 


Multiplication Property 


A similar property relates multiplication in time to circular convolution in frequency. 


z(n)h(n) >-X(k)*H(h) 


Parseval's Theorem 


Parseval's theorem relates the energy of a length-N discrete-time signal (or one 
period) to the energy of its DFT. 


N-1 ; iA ; 
x(n = — X(k 
> (n)|)" = 5 2 ll (k)|) 


Symmetry 


The continuous-time Fourier transform, the DTFT, and DFT are all defined as 
transforms of complex-valued data to complex-valued spectra. However, in practice 
signals are often real-valued. The DFT of a real-valued discrete-time signal has a 


special symmetry, in which the real part of the transform values are DFT even 
symmetric and the imaginary part is DFT odd symmetric, as illustrated in the 
equation and figure below. 


x(n) real X(k) = X((N — k) mod N) (This implies X(0), X(4) are real- 
valued.) 


DFT symmetry of real-valued signal 


Real part of X(k) is even 


Even-symmetry in DFT sense 


Imaginary part of X(k) is odd 


Odd-symmetry in DFT sense 


Spectrum Analysis Using the Discrete Fourier Transform 


Discrete-Time Fourier Transform 


The Discrete-Time Fourier Transform (DTFT) is the primary theoretical tool 
for understanding the frequency content of a discrete-time (sampled) signal. 
The DTFT is defined as 

Equation: 


oO 


X(w) = ` a(n)e~ e) 


n=— o0 


The inverse DTFT (IDTFT) is defined by an integral formula, because it 
operates on a continuous-frequency DTFT spectrum: 
Equation: 


1 f" 
x(n) = xf X(k)e" dw 


The DTFT is very useful for theory and analysis, but is not practical for 
numerically computing a spectrum digitally, because 


1. infinite time samples means 


o infinite computation 
o infinite delay 


2. The transform is continuous in the discrete-time frequency, @ 


For practical computation of the frequency content of real-world signals, the 
Discrete Fourier Transform (DFT) is used. 


Discrete Fourier Transform 


The DFT transforms N samples of a discrete-time signal to the same number of 
discrete frequency samples, and is defined as 
Equation: 


The DFT is invertible by the inverse discrete Fourier transform (IDFT): 
Equation: 


The DFT and IDFT are a self-contained, one-to-one transform pair for a length- 
N discrete-time signal. (That is, the DFT is not merely an approximation to the 
DTFT as discussed next.) However, the DFT is very often used as a practical 
approximation to the DIFT. 


Relationships Between DFT and DTFT 


DFT and Discrete Fourier Series 


The DFT gives the discrete-time Fourier series coefficients of a periodic 
sequence (x(n) = x(n + N)) of period N samples, or 
Equation: 


as can easily be confirmed by computing the inverse DTFT of the 
corresponding line spectrum: 
Equation: 


a(n) = + f7 (= X(k)d(w — 27E ) Jein dw 
= x Ales X (ket 
= IDFT(X(k)) 


= a(n) 


The DFT can thus be used to exactly compute the relative values of the N line 
spectral components of the DTFT of any periodic discrete-time sequence with 
an integer-length period. 


DFT and DTFT of finite-length data 


When a discrete-time sequence happens to equal zero for all samples except for 

those between 0 and N — 1, the infinite sum in the DIFT equation becomes 

the same as the finite sum from 0 to N — 1 in the DFT equation. By matching 

the arguments in the exponential terms, we observe that the DFT values exactly 
2rk 


equal the DTFT for specific DTFT frequencies wg = =~ . That is, the DFT 


computes exact samples of the DTFT at N equally spaced frequencies 


Wk = 258 | or 


27k = . we i2mnk 
(us ~ 7 = YF a(nje Om) = X s(n) T = X(h) 


DFT as a DTFT approximation 


In most cases, the signal is neither exactly periodic nor truly of finite length; in 
such cases, the DFT of a finite block of N consecutive discrete-time samples 
does not exactly equal samples of the DTFT at specific frequencies. Instead, 
the DFT gives frequency samples of a windowed (truncated) DTFT 


3 T N-11 f = . 
z(o = T) = ` g(nje7 En) = ` z(njw(n)e™ = = X(k) 


n=0 n=— 00 


lit O0<n<N 
0 if else 
DTFT frequency sample only when Yn, n é [0, N — 1] : (a(n) = 0) 


where w(n) = l Once again, X(k) exactly equals X(w,) a 


Relationship between continuous-time FT and DFT 


The goal of spectrum analysis is often to determine the frequency content of an 
analog (continuous-time) signal; very often, as in most modern spectrum 
analyzers, this is actually accomplished by sampling the analog signal, 
windowing (truncating) the data, and computing and plotting the magnitude of 
its DFT. It is thus essential to relate the DFT frequency samples back to the 
original analog frequency. Assuming that the analog signal is bandlimited and 
the sampling frequency exceeds twice that limit so that no frequency aliasing 
occurs, the relationship between the continuous-time Fourier frequency 2 (in 
radians) and the DTFT frequency w imposed by sampling is w = RT where T 
is the sampling period. Through the relationship wk = oak. between the DTFT 
frequency w and the DFT frequency index k, the correspondence between the 
DFT frequency index and the original analog frequency can be found: 


E 2rk 


OS 
NT 


or in terms of analog frequency f in Hertz (cycles per second rather than 
radians) 


k 
f= 
NT 
for k in the range k between 0 and J, It is important to note that 
ke | +1,N—- 1] correspond to negative frequencies due to the periodicity 
of the DTFT and the DFT. 
Exercise: 


Problem: 


In general, will DFT frequency values X(k) exactly equal samples of the 
analog Fourier transform X, at the corresponding frequencies? That is, 
will X(k) = Xa (#4)? 

Solution: 


In general, NO. The DTFT exactly corresponds to the continuous-time 
Fourier transform only when the signal is bandlimited and sampled at 
more than twice its highest frequency. The DFT frequency values exactly 
correspond to frequency samples of the DTFT only when the discrete-time 


signal is time-limited. However, a bandlimited continuous-time signal 
cannot be time-limited, so in general these conditions cannot both be 
satisfied. 


It can, however, be true for a small class of analog signals which are not 
time-limited but happen to exactly equal zero at all sample times outside 
of the interval n € [0, N — 1]. The sinc function with a bandwidth equal 
to the Nyquist frequency and centered at t = 0 is an example. 


Zero-Padding 


If more than N equally spaced frequency samples of a length-N signal are 
desired, they can easily be obtained by zero-padding the discrete-time signal 
and computing a DFT of the longer length. In particular, if LN DTFT samples 
are desired of a length- sequence, one can compute the length-LN DFT of a 
length-L.N zero-padded sequence 


(n) a(n) if O0<n<N-1 
FA = 
Oif N<n<IN-1 


27k = . Inkn i - 2nkn 
x(w = cn) = a(n)e~ tx") = ` z(nje l W) = DFT py [z[n]| 


Note that zero-padding interpolates the spectrum. One should always zero-pad 
(by about at least a factor of 4) when using the DFT to approximate the DITET 
to get a clear picture of the DIFT. While performing computations on zeros 
may at first seem inefficient, using FFT algorithms, which generally expect the 
same number of input and output samples, actually makes this approach very 
efficient. 


[link] shows the magnitude of the DFT values corresponding to the non- 
negative frequencies of a real-valued length-64 DFT of a length-64 signal, both 
in a "stem" format to emphasize the discrete nature of the DFT frequency 
samples, and as a line plot to emphasize its use as an approximation to the 
continuous-in-frequency DTFT. From this figure, it appears that the signal has a 
single dominant frequency component. 

Spectrum without zero-padding 


Magnitude DFT spectrum of 64 samples of a signal with a length-64 DFT 


(no zero padding) 
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Zero-padding by a factor of two by appending 64 zero values to the signal and 
computing a length-128 DFT yields [link]. It can now be seen that the signal 
consists of at least two narrowband frequency components; the gap between 
them fell between DFT samples in [link], resulting in a misleading picture of 
the signal's spectral content. This is sometimes called the picket-fence effect, 
and is a result of insufficient sampling in frequency. While zero-padding by a 
factor of two has revealed more structure, it is unclear whether the peak 
magnitudes are reliably rendered, and the jagged linear interpolation in the line 
graph does not yet reflect the smooth, continuously-differentiable spectrum of 
the DTFT of a finite-length truncated signal. Errors in the apparent peak 
magnitude due to insufficient frequency sampling is sometimes referred to as 
scalloping loss. 

Spectrum with factor-of-two zero-padding 


Magnitude DFT spectrum of 64 samples of a signal with a length-128 
DFT (double-length zero-padding) 


Stem plot 
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Zero-padding to four times the length of the signal, as shown in [link], clearly 
shows the spectral structure and reveals that the magnitude of the two spectral 
lines are nearly identical. The line graph is still a bit rough and the peak 
magnitudes and frequencies may not be precisely captured, but the spectral 
characteristics of the truncated signal are now clear. 

Spectrum with factor-of-four zero-padding 


Magnitude DFT spectrum of 64 samples of a signal with a length-256 
zero-padded DFT (four times zero-padding) 


Stem plot 
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Zero-padding to a length of 1024, as shown in [link] yields a spectrum that is 
smooth and continuous to the resolution of the computer screen, and produces a 
very accurate rendition of the DTFT of the truncated signal. 

Spectrum with factor-of-sixteen zero-padding 


Magnitude DFT spectrum of 64 samples of a signal with a length-1024 
zero-padded DFT. The spectrum now looks smooth and continuous and 


reveals all the structure of the DTFT of a truncated signal. 


Stem plot 
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The signal used in this example actually consisted of two pure sinusoids of 
equal magnitude. The slight difference in magnitude of the two dominant 
peaks, the breadth of the peaks, and the sinc-like lesser side lobe peaks 
throughout frequency are artifacts of the truncation, or windowing, process 
used to practically approximate the DFT. These problems and partial solutions 
to them are discussed in the following section. 


Effects of Windowing 
Applying the DTFT multiplication property 
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we find that the DFT of the windowed (truncated) signal produces samples not 
of the true (desired) DIFT spectrum X(w), but of a smoothed verson 
X(w)*W (w). We want this to resemble X (w) as closely as possible, so W (w) 
should be as close to an impulse as possible. The window w(n) need not be a 
simple truncation (or rectangle, or boxcar) window; other shapes can also be 
used as long as they limit the sequence to at most N consecutive nonzero 
samples. All good windows are impulse-like, and represent various tradeoffs 
between three criteria: 


1. main lobe width: (limits resolution of closely-spaced peaks of equal 
height) 

2. height of first sidelobe: (limits ability to see a small peak near a big peak) 

3. slope of sidelobe drop-off: (limits ability to see small peaks further away 
from a big peak) 


Many different window functions have been developed for truncating and 
shaping a length-NV signal segment for spectral analysis. The simple truncation 
window has a periodic sinc DTFT, as shown in [link]. It has the narrowest 
main-lobe width, ae at the -3 dB level and AE between the two zeros 
surrounding the main lobe, of the common window functions, but also the 
largest side-lobe peak, at about -13 dB. The side-lobes also taper off relatively 
slowly. 


Length-64 truncation (boxcar) window and its magnitude DFT spectrum 
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The Hann window (sometimes also called the hanning window), illustrated in 


[link], takes the form win] = 0.5 — 0.5 cos( = ) for n between 0 and N — 1. 


It has a main-lobe width (about 3z at the -3 dB level and a between the two 
zeros surrounding the main lobe) considerably larger than the rectangular 
window, but the largest side-lobe peak is much lower, at about -31.5 dB. The 
side-lobes also taper off much faster. For a given length, this window is worse 
than the boxcar window at separating closely-spaced spectral components of 
similar magnitude, but better for identifying smaller-magnitude components at 
a greater distance from the larger components. 


Length-64 Hann window and its magnitude DFT spectrum 


Hann window 
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The Hamming window, illustrated in [link], has a form similar to the Hann 
window but with slightly different constants: 


w(n] = 0.538 — 0.462 cos(-2"-) for n between 0 and N — 1. Since it is 


composed of the same Fourier series harmonics as the Hann window, it has a 
similar main-lobe width (a bit less than aE at the -3 dB level and Sf. between 
the two zeros surrounding the main lobe), but the largest side-lobe peak is much 
lower, at about -42.5 dB. However, the side-lobes also taper off much more 
slowly than with the Hann window. For a given length, the Hamming window 
is better than the Hann (and of course the boxcar) windows at separating a 
small component relatively near to a large component, but worse than the Hann 
for identifying very small components at considerable frequency separation. 
Due to their shape and form, the Hann and Hamming windows are also known 
as raised-cosine windows. 


Length-64 Hamming window and its magnitude DFT spectrum 
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Note:Standard even-length windows are symmetric around a point halfway 
between the window samples x — land 2. For some applications such as 
time-frequency analysis, it may be important to align the window perfectly to a 
sample. In such cases, a DF T-symmetric window that is symmetric around the 
X ath sample can be used. For example, the DFT-symmetric Hamming 
window is w[n] = 0.538 — 0.462 cos( 2). A DFT-symmetric window has a 
purely real-valued DFT and DTFT. DF T-symmetric versions of windows, such 
as the Hamming and Hann windows, composed of few discrete Fourier series 
terms of period N, have few non-zero DFT terms (only when not zero- 
padded) and can be used efficiently in running FFTs. 


The main-lobe width of a window is an inverse function of the window-length 
N; for any type of window, a longer window will always provide better 
resolution. 


Many other windows exist that make various other tradeoffs between main-lobe 
width, height of largest side-lobe, and side-lobe rolloff rate. The Kaiser window 
family, based on a modified Bessel function, has an adjustable parameter that 
allows the user to tune the tradeoff over a continuous range. The Kaiser 
window has near-optimal time-frequency resolution and is widely used. A list 
of many different windows can be found here. 


Example: 
[link] shows 64 samples of a real-valued signal composed of several sinusoids 
of various frequencies and amplitudes. 
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64 samples of an unknown signal 


[link] shows the magnitude (in dB) of the positive frequencies of a length-1024 
zero-padded DFT of this signal (that is, using a simple truncation, or 
rectangular, window). 
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Magnitude (in dB) of the zero-padded DFT spectrum of the signal 
in [link] using a simple length-64 rectangular window 


From this spectrum, it is clear that the signal has two large, nearby frequency 
components with frequencies near 1 radian of essentially the same magnitude. 
[link] shows the spectral estimate produced using a length-64 Hamming 
window applied to the same signal shown in [link]. 
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Magnitude (in dB) of the zero-padded DFT spectrum of the signal 
in [link] using a length-64 Hamming window 


The two large spectral peaks can no longer be resolved; they blur into a single 
broad peak due to the reduced spectral resolution of the broader main lobe of 
the Hamming window. However, the lower side-lobes reveal a third 
component at a frequency of about 0.7 radians at about 35 dB lower magnitude 
than the larger components. This component was entirely buried under the 
side-lobes when the rectangular window was used, but now stands out well 
above the much lower nearby side-lobes of the Hamming window. 

[link] shows the spectral estimate produced using a length-64 Hann window 
applied to the same signal shown in [link]. 
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Magnitude (in dB) of the zero-padded DFT spectrum of the signal 
in [link] using a length-64 Hann window 


The two large components again merge into a single peak, and the smaller 
component observed with the Hamming window is largely lost under the 
higher nearby side-lobes of the Hann window. However, due to the much faster 
side-lobe rolloff of the Hann window's spectrum, a fourth component at a 
frequency of about 2.5 radians with a magnitude about 65 dB below that of the 
main peaks is now Clearly visible. 

This example illustrates that no single window is best for all spectrum 
analyses. The best window depends on the nature of the signal, and different 
windows may be better for different components of the same signal. A skilled 
spectrum analysist may apply several different windows to a signal to gain a 
fuller understanding of the data. 


Classical Statistical Spectral Estimation 


Many signals are either partly or wholly stochastic, or random. Important 
examples include human speech, vibration in machines, and COMA 
communication signals. Given the ever-present noise in electronic systems, 
it can be argued that almost all signals are at least partly stochastic. Such 
signals may have a distinct average spectral structure that reveals important 
information (such as for speech recognition or early detection of damage in 
machinery). Spectrum analysis of any single block of data using window- 
based deterministic spectrum analysis, however, produces a random 
spectrum that may be difficult to interpret. For such situations, the classical 
Statistical spectrum estimation methods described in this module can be 
used. 


The goal in classical statistical spectrum analysis is to estimate 
E (|x (w) p? , the power spectral density (PSD) across frequency of the 


stochastic signal. That is, the goal is to find the expected (mean, or average) 
energy density of the signal as a function of frequency. (For zero-mean 
signals, this equals the variance of each frequency sample.) Since the 
spectrum of each block of signal samples is itself random, we must average 
the squared spectral magnitudes over a number of blocks of data to find the 
mean. There are two main classical approaches, the periodogram and auto- 
correlation methods. 


Periodogram method 


The periodogram method divides the signal into a number of shorter (and 
often overlapped) blocks of data, computes the squared magnitude of the 
windowed (and usually zero-padded) DFT, X;(w;), of each block, and 
averages them to estimate the power spectral density. The squared 
magnitudes of the DFTs of L possibly overlapped length-N windowed 
blocks of signal (each probably with zero-padding) are averaged to estimate 
the power spectral density: 
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For a fixed total number of samples, this introduces a tradeoff: Larger 
individual data blocks provides better frequency resolution due to the use of 
a longer window, but it means there are less blocks to average, so the 
estimate has higher variance and appears more noisy. The best tradeoff 
depends on the application. Overlapping blocks by a factor of two to four 
increases the number of averages and reduces the variance, but since the 
same data is being reused, still more overlapping does not further reduce the 
variance. As with any window-based spectrum estimation procedure, the 
window function introduces broadening and sidelobes into the power 
spectrum estimate. That is, the periodogram produces an estimate of the 


windowed spectrum X(w) =F (|X (w)*Warl)?| , not of ENX)? À 


Example: 
[link] shows the non-negative frequencies of the DFT (zero-padded to 
1024 total samples) of 64 samples of a real-valued stochastic signal. 
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DFT magnitude (in dB) of 64 samples of a stochastic signal 


With no averaging, the power spectrum is very noisy and difficult to 
interpret other than noting a significant reduction in spectral energy above 
about half the Nyquist frequency. Various peaks and valleys appear in the 
lower frequencies, but it is impossible to say from this figure whether they 
represent actual structure in the power spectral density (PSD) or simply 
random variation in this single realization. [link] shows the same 
frequencies of a length-1024 DFT of a length-1024 signal. While the 
frequency resolution has improved, there is still no averaging, so it remains 
difficult to understand the power spectral density of this signal. Certain 
small peaks in frequency might represent narrowband components in the 
spectrum, or may just be random noise peaks. 
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DFT magnitude (in dB) of 1024 samples of a stochastic signal 


In [link], a power spectral density computed from averaging the squared 
magnitudes of length-1024 zero-padded DFTs of 508 length-64 blocks of 
data (overlapped by a factor of four, or a 16-sample step between blocks) 
are shown. 
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Power spectrum density estimate (in dB) of 1024 samples of a 
stochastic signal 


While the frequency resolution corresponds to that of a length-64 
truncation window, the averaging greatly reduces the variance of the 
spectral estimate and allows the user to reliably conclude that the signal 
consists of lowpass broadband noise with a flat power spectrum up to half 
the Nyquist frequency, with a stronger narrowband frequency component 
at around 0.65 radians. 


Auto-correlation-based approach 


The averaging necessary to estimate a power spectral density can be 
performed in the discrete-time domain, rather than in frequency, using the 
auto-correlation method. The squared magnitude of the frequency response, 
from the DTFT multiplication and conjugation properties, corresponds in 
the discrete-time domain to the signal convolved with the time-reverse of 
itself, 


(IXW? = X(w)X"(w) + (a(n), 2"(-n)) = r(n) 


or its auto-correlation 


r(n) = X` a(k)x (n+ k) 
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We can thus compute the squared magnitude of the spectrum of a signal by 
computing the DFT of its auto-correlation. For stochastic signals, the power 
spectral density is an expectation, or average, and by linearity of 
expectation can be found by transforming the average of the auto- 
correlation. For a finite block of N signal samples, the average of the 
autocorrelation values, r(n), is 


1 N-(1-n) . 
r(n) = > — 2 x(k)a "(n+ k) 


Note that with increasing lag, n, fewer values are averaged, so they 
introduce more noise into the estimated power spectrum. By windowing the 
auto-correlation before transforming it to the frequency domain, a less noisy 
power spectrum is obtained, at the expense of less resolution. The 
multiplication property of the DTFT shows that the windowing smooths the 
resulting power spectrum via convolution with the DTFT of the window: 
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This yields another important interpretation of how the auto-correlation 
method works: it estimates the power spectral density by averaging the 
power spectrum over nearby frequencies, through convolution with the 
window function's transform, to reduce variance. Just as with the 
periodogram approach, there is always a variance vs. resolution tradeoff. 
The periodogram and the auto-correlation method give similar results for a 
similar amount of averaging; the user should simply note that in the 
periodogram case, the window introduces smoothing of the spectrum via 
frequency convolution before squaring the magnitude, whereas the 
periodogram convolves the squared magnitude with W(w). 


Short Time Fourier Transform 


Short Time Fourier Transform 


The Fourier transforms (FT, DTFT, DFT, etc.) do not clearly indicate how 
the frequency content of a signal changes over time. 


That information is hidden in the phase - it is not revealed by the plot of the 
magnitude of the spectrum. 


Note: To see how the frequency content of a signal changes over time, we 
can cut the signal into blocks and compute the spectrum of each block. 


To improve the result, 


1. blocks are overlapping 
2. each block is multiplied by a window that is tapered at its endpoints. 


Several parameters must be chosen: 


e Block length, R. 

e The type of window. 

e Amount of overlap between blocks. ([link]) 
e Amount of zero padding, if any. 
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The parameter L 


L is the number of samples between adjacent blocks. 


The short-time Fourier transform is defined as 
Equation: 
X(w,m) = STFT (z(n)) = DTFT (z(n — m)w(n)) 
Dro T(n — m)w(nje~ 0) 
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where w(n) is the window function of length R. 


1. The STFT of a signal x(n) is a function of two variables: time and 
frequency. 

2. The block length is determined by the support of the window function 
w(n). 

3. A graphical display of the magnitude of the STFT, | X(w, m) 
the spectrogram of the signal. It is often used in speech processing. 

4. The STFT of a signal is invertible. 

5. One can choose the block length. A long block length will provide 
higher frequency resolution (because the main-lobe of the window 


, is called 


function will be narrow). A short block length will provide higher time 
resolution because less averaging across samples is performed for each 


STFT value. 


6. A narrow-band spectrogram is one computed using a relatively long 


block length R, (long window function). 
7. A wide-band spectrogram is one computed using a relatively short 
block length R, (short window function). 


Sampled STFT 


To numerically evaluate the STFT, we sample the frequency axis w in N 
equally spaced samples from w = 0 to w = 2r. 
Equation: 
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We then have the discrete STFT, 
Equation: 


X4(k,m) = X(22k,m) = VET e(n- m)w 


where 0,...0 is N — R. 


In this definition, the overlap between adjacent blocks is R — 1. The signal 
is shifted along the window one sample at a time. That generates more 
points than is usually needed, so we also sample the STFT along the time 
direction. That means we usually evaluate 


X%(k, Lm) 


where L is the time-skip. The relation between the time-skip, the number of 
overlapping samples, and the block length is 


Overlap = R— L 


Exercise: 


Problem: Match each signal to its spectrogram in [link]. 
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Solution: 


Spectrogram Example 
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The matlab program for producing the figures above ([link] and [link]). 


% LOAD DATA 
load mtlb; 
x = mtlb; 


figure(1), clf 
plot(0:4000, x) 
Xlabel('n') 

ylabel('x(n)') 


% SET PARAMETERS 

R = 256; % R: block length 

window = hamming(R); % window function 
of length R 


N = 512; % N: frequency 
discretization 

L = 35; % L: time lapse 
between blocks 

fs = 7418; % fs: sampling 


frequency 
overlap = R - L; 


% COMPUTE SPECTROGRAM 
[B,f,t] = 
specgram(x,N,fs,window, overlap); 


% MAKE PLOT 

figure(2), clf 
imagesc(t,f,log10(abs(B))); 
colormap('jet') 

axis xy 

xlabel('time') 
ylabel('frequency') 
title('SPECTROGRAM, R = 256') 


Effect of window length R 


Narrow-band spectrogram: better frequency resolution 
SPECTROGRAM, R = 512 
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time 


Wide-band spectrogram: better time resolution 
SPECTROGRAM, R = 128 
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time 


Here is another example to illustrate the frequency/time resolution trade-off 
(See figures - [link], [link], and [link]). 
Effect of Window Length R 
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A spectrogram is computed with different parameters: 


Pet, 10) 


N € {32, 256} 


e L = time lapse between blocks. 
e N = FFT length (Each block is zero-padded to length N.) 


In each case, the block length is 30 samples. 
Exercise: 


Problem: 


For each of the four spectrograms in [link] can you tell what L and N 


are? 
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L and N do not effect the time resolution or the frequency resolution. They 


only affect the 'pixelation’. 


Effect of R and L 


Shown below are four spectrograms of the same signal. Each spectrogram 
is computed using a different set of parameters. 


R € {120, 256, 1024} 


L € {35,250} 


where 


e R= block length 
e L = time lapse between blocks. 


Exercise: 


Problem: 


For each of the four spectrograms in [link], match the above values of 
Land R. 
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Solution: 


If you like, you may listen to this signal with the Soundsc command; the 
data is in the file: stft_data.m. Here is a figure of the signal. 


4 
0 


1000 


2000 


3000 


4000 


5000 
n 


6000 


7000 


8000 


$000 


10000 


Overview of Fast Fourier Transform (FFT) Algorithms 


A fast Fourier transform, or FFT, is not a new transform, but is a 
computationally efficient algorithm for the computing the DFT. The length- 
N DFT, defined as 

Equation: 


where X(k) and x(n) are in general complex-valued and 0 < k, 

n < N —1, requires N complex multiplies to compute each X(k). Direct 
computation of all N frequency samples thus requires N? complex 
multiplies and N (N — 1) complex additions. (This assumes 
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precomputation of the DFT coefficients wre => ex"): otherwise, the 
cost is even higher.) For the large DFT lengths used in many applications, 
N? operations may be prohibitive. (For example, digital terrestrial 
television broadcast in Europe uses N = 2048 or 8192 OFDM channels, and 
the SETI project uses up to length-4194304 DFTs.) DFTs are thus almost 
always computed in practice by an EFT algorithm. FFTs are very widely 
used in signal processing, for applications such as spectrum analysis and 
digital filtering via fast convolution. 


History of the FFT 


It is now known that C.F. Gauss invented an FFT in 1805 or so to assist the 
computation of planetary orbits via discrete Fourier series. Various FFT 
algorithms were independently invented over the next two centuries, but 
FFTs achieved widespread awareness and impact only with the Cooley and 
Tukey algorithm published in 1965, which came at a time of increasing use 
of digital computers and when the vast range of applications of numerical 
Fourier techniques was becoming apparent. Cooley and Tukey's algorithm 
spawned a surge of research in FFTs and was also partly responsible for the 
emergence of Digital Signal Processing (DSP) as a distinct, recognized 


discipline. Since then, many different algorithms have been rediscovered or 
developed, and efficient FFTs now exist for all DFT lengths. 


Summary of FFT algorithms 


The main strategy behind most FFT algorithms is to factor a length-N DFT 
into a number of shorter-length DFTs, the outputs of which are reused 
multiple times (usually in additional short-length DFTs!) to compute the 
final results. The lengths of the short DFTs correspond to integer factors of 
the DFT length, NV, leading to different algorithms for different lengths and 
factors. By far the most commonly used FFTs select N = 2™ to be a power 
of two, leading to the very efficient power-of-two FFT algorithms, 
including the decimation-in-time radix-2 FFT and the decimation-in- 
frequency radix-2 FFT algorithms, the radix-4 FFT (N = 4), and the 
split-radix FFT. Power-of-two algorithms gain their high efficiency from 
extensive reuse of intermediate results and from the low complexity of 
length-2 and length-4 DFTs, which require no multiplications. Algorithms 
for lengths with repeated common factors (such as 2 or 4 in the radix-2 and 
radix-4 algorithms, respectively) require extra twiddle factor 
multiplications between the short-length DFTs, which together lead to a 
computational complexity of O(N log N), a very considerable savings over 
direct computation of the DFT. 


The other major class of algorithms is the Prime-Factor Algorithms (PFA). 
In PFAs, the short-length DFTs must be of relatively prime lengths. These 
algorithms gain efficiency by reuse of intermediate computations and by 
eliminating twiddle-factor multiplies, but require more operations than the 
power-of-two algorithms to compute the short DFTs of various prime 
lengths. In the end, the computational costs of the prime-factor and the 
power-of-two algorithms are comparable for similar lengths, as illustrated 
in Choosing the Best FET Algorithm. Prime-length DFTs cannot be 
factored into shorter DFTs, but in different ways both Rader's conversion 
and the chirp z-transform convert prime-length DFTs into convolutions of 
other lengths that can be computed efficiently using FFTs via fast 
convolution. 


Some applications require only a few DFT frequency samples, in which 
case Goertzel's algorithm halves the number of computations relative to the 
DFT sum. Other applications involve successive DFTs of overlapped blocks 
of samples, for which the running FFT can be more efficient than separate 
FFTs of each block. 


Running FFT 


Some applications need DFT frequencies of the most recent N samples on an ongoing basis. One example is 
DTMF, or touch-tone telephone dialing, in which a detection circuit must constantly monitor the line for two 
simultaneous frequencies indicating that a telephone button is depressed. In such cases, most of the data in each 
successive block of samples is the same, and it is possible to efficiently update the DFT value from the previous 
sample to compute that of the current sample. [link] illustrates successive length-4 blocks of data for which 
successive DFT values may be needed. The running FFT algorithm described here can be used to compute 
successive DFT values at a cost of only two complex multiplies and additions per DFT frequency. 
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The running FFT 
efficiently computes 
DFT values for 
successive overlapped 
blocks of samples. 


The running FFT algorithm is derived by expressing each DFT sample, X,,+41(w x), for the next block at time 
n + 1 in terms of the previous value, Xn (wx), at time n. 
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Now let's add and subtract e~(*("—2)) a(n — N + 1): 
Equation: 
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This running FFT algorithm requires only two complex multiplies and adds per update, rather than N if each DFT 
value were recomputed according to the DFT equation. Another advantage of this algorithm is that it works for 


any wz, rather than just the standard DFT frequencies. This can make it advantageous for applications, such as 
DTMF detection, where only a few arbitrary frequencies are needed. 


Successive computation of a specific DFT frequency for overlapped blocks can also be thought of as a length-NV 
FIR filter. The running FFT is an efficient recursive implementation of this filter for this special case. [link] shows 
a block diagram of the running FFT algorithm. The running FFT is one way to compute DFT filterbanks. If a 
window other than rectangular is desired, a running FFT requires either a fast recursive implementation of the 
corresponding windowed, modulated impulse response, or it must have few non-zero coefficients so that it can be 
applied after the running FFT update via frequency-domain convolution. DF T-symmmetric raised-cosine windows 
are an example. 


Block diagram of the running FFT computation, implemented as a recursive filter 


Goertzel's Algorithm 


demodulation, in which typically two frequencies are used to transmit binary data; anotlier example is 
DTMF, or touch-tone telephone dialing, in which a detection circuit must constantly monitor the line 
for two simultaneous frequencies indicating that a telephone button is depressed. Goertzel's algorithm 
reduces the number of real-valued multiplications by almost a factor of two relative to direct 
computation via the DFT equation. Goertzel's algorithm is thus useful for computing a few frequency 
values; if many or most DFT values are needed, FFT algorithms that compute all DFT samples in 
O(N log N) operations are faster. Goertzel's algorithm can be derived by converting the DFT equation 
into an equivalent form as a convolution, which can be efficiently implemented as a digital filter. For 
increased clarity, in the equations below the complex exponential is denoted as e-l) = Wx. Note 
that because Wg” k always equals 1, the DFT equation can be rewritten as a convolution, or filtering 
operation: 

Equation: 


X(k) = NG =0 | e(n) lLWwrk 
= Eno 2(n)W Wy 
Bae =0 a(n) W 
= (((Wy*2(0) + a +2(2))Wy*+...+2(N—1))W,* 


Note that this last expression can be written in terms of a recursive difference equation 


y(n) = Wy"y(n — 1) + x(n) 
where y(—1) = 0. The DFT coefficient equals the output of the difference equation at time n = N: 
X(k) = y(N) 


Expressing the difference equation as a z-transform and multiplying both numerator and denominator 
by 1 — Wie gives the transfer function 


Y(z) H(z) = 1 1-Wkz 1— Wz! 
— FA = = — 
X(z) 1- Wy kz—1 1— (Wk + Wa je — z2) 1—- (2 cos ( 2k) 2-1 — z7?) 


This system can be realized by the structure in [link] 


We want y(n) not for all n, but only for n = N. We can thus compute only the recursive part, or just 
the left side of the flow graph in [link], for n = [0,1,..., N], which involves only a real/complex 


product rather than a complex/complex product as in a direct DFT, plus one complex multiply to get 
y(N) = X(k). 


Note:The input x(NV) at time n = N must equal 0! A slightly more efficient alternate implementation 
that computes the full recursion only through n = N — 1 and combines the nonzero operations of the 
final recursion with the final complex multiply can be found here, complete with pseudocode (for real- 
valued data). 


If the data are real-valued, only real/real multiplications and real additions are needed until the final 
multiply. 


Note:The computational cost of Goertzel's algorithm is thus 2N + 2 real multiplies and 4N — 2 real 
adds, a reduction of almost a factor of two in the number of real multiplies relative to direct 
computation via the DFT equation. If the data are real-valued, this cost is almost halved again. 


For certain frequencies, additional simplifications requiring even fewer multiplications are possible. 
(For example, for the DC (k = 0) frequency, all the multipliers equal 1 and only additions are needed.) 
A correspondence by C.G. Boncelet, Jr. describes some of these additional simplifications. Once again, 
Goertzel's and Boncelet's algorithms are efficient for a few DFT frequency samples; if more than log N 
frequencies are needed, O(N log N) FFT algorithms that compute all frequencies simultaneously will 
be more efficient. 


Power-of-two FFTs 


FFTs of length N = gM equal to a power of two are, by far, the most 
commonly used. These algorithms are very efficient, relatively simple, and 
a single program can compute power-of-two FFTs of different lengths. As 
with most FFT algorithms, they gain their efficiency by computing all DET 
points simultaneously through extensive reuse of intermediate 
computations; they are thus efficient when many DFT frequency samples 
are needed. The simplest power-of-two FFTs are the decimation-in-time 
radix-2 FFT and the decimation-in-frequency_radix-2 FFT; they reduce the 
length-N = 2™ DFT to a series of length-2 DFT computations with 
twiddle-factor complex multiplications between them. The radix-4 FFT 
algorithm similarly reduces a length-N = 4” DFT to a series of length-4 
DFT computations with twiddle-factor multiplies in between. Radix-4 FFTs 
require only 75% as many complex multiplications as the radix-2 
algorithms, although the number of complex additions remains the same. 
Radix-8 and higher-radix FFT algorithms can be derived using multi- 
dimensional index maps to reduce the computational complexity a bit more. 
However, the split-radix algorithm and its recent extensions combine the 
best elements of the radix-2 and radix-4 algorithms to obtain lower 
complexity than either or than any higher radix, requiring only two-thirds as 
many complex multiplies as the radix-2 algorithms. All of these algorithms 
obtain huge savings over direct computation of the DFT, reducing the 
complexity from O(N?) to O(N log N). 


The efficiency of an FFT implementation depends on more than just the 
number of computations. Efficient FFT programming tricks can make up to 
a several-fold difference in the run-time of FFT programs. Alternate FFT 
structures can lead to a more convenient data flow for certain hardware. As 
discussed in choosing the best FFT algorithm, certain hardware is designed 
for, and thus most efficient for, FFTs of specific lengths or radices. 


Decimation-in-time (DIT) Radix-2 FFT 


The radix-2 decimation-in-time and decimation-in-frequency, fast Fourier transforms (FFTs) 
are the simplest FFT algorithms. Like all FFTs, they gain their speed by reusing the results of 
smaller, intermediate computations to compute multiple DFT frequency outputs. 


Decimation in time 


The radix-2 decimation-in-time algorithm rearranges the discrete Fourier transform (DFT) 
equation into two parts: a sum over the even-numbered discrete-time indices 

n = [0, 2, 4,..., N — 2] anda sum over the odd-numbered indices n = [1,3,5,..., N — 1] 
as in [link]: 

Equation: 


X(k) = Era a(nje“ OW) 
. Inx(2n)k f 2n(2n+1)k 


= Fe atone OP) 4 FS cent ye OF) 


= E a 7) +e- D (Qn + w a) 
= DFT» [[x(0),2(2),...,2(N — 2)]] + WẸ DFT x [[æ(1), 2(3),-..,2(W — 1)]] 


The mathematical simplifications in [link] reveal that all DFT frequency outputs X (k) can be 
computed as the sum of the outputs of two length- = DFTs, of the even-indexed and odd- 
indexed discrete-time samples, respectively, where the odd-indexed short DFT is multiplied 
by a so-called twiddle factor term wE = e~ (i®), This is called a decimation in time 
because the time samples are rearranged in alternating groups, and a radix-2 algorithm 
because there are two groups. [link] graphically illustrates this form of the DFT computation, 
where for convenience the frequency outputs of the lengt -A DFT of the even-indexed time 
samples are denoted G (k) and those of the odd-indexed samples as H (k). Because of the 
periodicity with Æ frequency samples of these length-Æ DFTs, G(k) and H(k) can be used 
to compute two of the length-N DFT frequencies, namely X(k) and X (k + +), but with a 
different twiddle factor. This reuse of these short-length DFT outputs gives the FFT its 
computational savings. 
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Decimation in time of a length- DFT into two length- DFTs 
followed by a combining stage. 


Whereas direct computation of all N DFT frequencies according to the DFT equation would 
require N? complex multiplies and N? — N complex additions (for complex-valued data), 
by reusing the results of the two short-length DFTs as illustrated in [link], the computational 
cost is now 

New Operation Counts 


o Iy + N = ®* + N complex multiplies 
225 (4 =1)4N= — complex additions 


This simple reorganization and reuse has reduced the total computation by almost a factor of 
two over direct DET computation! 


Additional Simplification 


A basic butterfly operation is shown in [link], which requires only — twiddle-factor 
multiplies per stage. It is worthwhile to note that, after merging the twiddle factors to a single 
term on the lower branch, the remaining butterfly is actually a length-2 DFT! The theory of 
multi-dimensional index maps shows that this must be the case, and that FFTs of any 


factorable length may consist of successive stages of shorter-length FFTs with twiddle-factor 
multiplications in between. 


j length-2 DFT 
G(i) 


“twiddle factor” 


Radix-2 DIT butterfly simplification: both operations produce the same outputs 


Radix-2 decimation-in-time FFT 


The same radix-2 decimation in time can be applied recursively to the two length x DFTs to 


save computation. When successively applied until the shorter and shorter DFTs reach length- 
2, the result is the radix-2 DIT FFT algorithm. 
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Radix-2 Decimation-in-Time FFT algorithm for a length-8 signal 


The full radix-2 decimation-in-time decomposition illustrated in [link] using the simplified 
butterflies involves M = log, N stages, each with + butterflies per stage. Each butterfly 
requires 1 complex multiply and 2 adds per butterfly. The total cost of the algorithm is thus 
Computational cost of radix-2 DIT FFT 


e = log, N complex multiplies 
e N log, N complex adds 


This is a remarkable savings over direct computation of the DFT. For example, a length-1024 
DFT would require 1048576 complex multiplications and 1047552 complex additions with 
direct computation, but only 5120 complex multiplications and 10240 complex additions 
using the radix-2 FFT, a savings by a factor of 100 or more. The relative savings increase with 
longer FFT lengths, and are less for shorter lengths. 


Modest additional reductions in computation can be achieved by noting that certain twiddle 
N 3N 


N N 
factors, namely Using special butterflies for we, Wi s Wi ; Wi ; Wr , require no 
multiplications, or fewer real multiplies than other ones. By implementing special butterflies 
for these twiddle factors as discussed in FFT algorithm and programming tricks, the 
computational cost of the radix-2 decimation-in-time FFT can be reduced to 


e 2N log, N — 7N + 12 real multiplies 
e 3N log, N — 3N + 4 real additions 


Note:In a decimation-in-time radix-2 FFT as illustrated in [link], the input is in bit-reversed 
order (hence "decimation-in-time"). That is, if the time-sample index n is written as a binary 
number, the order is that binary number reversed. The bit-reversal process is illustrated for a 
length-NV = 8 example below. 


Example: 


N=8 


In-order In-order index in Bit-reversed Bit-reversed 
index binary binary index 


In-order In-order index in Bit-reversed Bit-reversed 


index binary binary index 
0 000 000 0 
1 001 100 4 
2 010 010 2 
3 011 110 6 
4 100 001 1 
5 101 101 5 
6 110 011 3 
7 111 111 7 


It is important to note that, if the input signal data are placed in bit-reversed order before 
beginning the FFT computations, the outputs of each butterfly throughout the computation 
can be placed in the same memory locations from which the inputs were fetched, resulting in 
an in-place algorithm that requires no extra memory to perform the FFT. Most FFT 
implementations are in-place, and overwrite the input data with the intermediate values and 
finally the output. 


Example FFT Code 


The following function, written in the C programming language, implements a radix-2 
decimation-in-time FFT. It is designed for computing the DFT of complex-valued inputs to 
produce complex-valued outputs, with the real and imaginary parts of each number stored in 
separate double-precision floating-point arrays. It is an in-place algorithm, so the intermediate 
and final output values are stored in the same array as the input data, which is overwritten. 
After initializations, the program first bit-reverses the discrete-time samples, as is typical with 
a decimation-in-time algorithm (but see alternate FFT structures for DIT algorithms with 
other input orders), then computes the FFT in stages according to the above description. 


Ihis FFT program uses a standard three-loop structure for the main FFT computation. The 
outer loop steps through the stages (each column in [link]); the middle loop steps through 
"flights" (butterflies with the same twiddle factor from each short-length DFT at each stage), 
and the inner loop steps through the individual butterflies. This ordering minimizes the 
number of fetches or computations of the twiddle-factor values. Since the bit-reverse of a bit- 


reversed index is the original index, bit-reversal can be performed fairly simply by swapping 
pairs of data. 


Note:While of O(N log N) complexity and thus much faster than a direct DFT, this simple 
program is optimized for clarity, not for speed. A speed-optimized program making use of 
additional efficient FFT algorithm and programming tricks will compute a DFT several times 
faster on most machines. 
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/* fft.c 2 
/* (c) Douglas L. Jones */ 
/* University of Illinois at Urbana-Champaign ay 4 
/* January 19, 1992 */ 
/* *7 
/* fft: in-place radix-2 DIT DFT of a complex input */ 
/* */ 
/* input: */ 
/* n: length of FFT: must be a power of two Ry 
/* m: n = 2**m */ 
/* input/output */ 
/* x: double array of length n with real part of data */ 
/* y: double array of length n with imag part of data */ 
/* */ 
/* Permission to copy and use this program is granted */ 
/* under a Creative Commons "Attribution" license */ 
/* http://creativecommons.org/licenses/by/1.0/ */ 
Jf PRARERLIE RARE BRR BERR RAR AR REKKE RRE ERRORS RR LSI A RRR RS EERE EEEE 
fft(n,m,x,y) 

int n,m; 


double x[],y[]; 
{ 


int 1,j,k,n1,n2; 
double c,s,e,a,t1i, t2; 


j = 0; /* bit-reverse */ 


n2 = n/2; 
for (1=1; 1 < n - 1; i++) 
{ 

n1 = n2; 


while ( j >= n1 ) 


j=j - n1; 
ni = n1/2; 
a 
jJ =: EG 
if (i < j) 
t1 = x[i]; 
x[i] = x[j]; 
x[j] = t1; 
t1 = y[i]; 
yli] = y[j]; 
y[j] = t1; 
} 
ni = 0; /* FFT */ 
n2 = 1; 


for (1=0; i < m; i++) 

{ 

n2; 

n2 + n2; 
-6.283185307179586/n2; 
0.0; 


for (j=0; j < n1; j++) 


cos(a); 
sin(a); 
a +e; 


ie) 
Wout ul 


for (k=j; k < n; k=k+n2) 


t1 = c*x[k+n1i] - s*y[k+n1]; 
t2 = s*x[k+n1i] + c*y[k+n1]; 
x[k+n1] = x[k] - t1; 


y[k+n1] = y[k] - t2; 
x[k] = x[k] + t1; 
eee = y[k] + t2; 


return; 


} 


Decimation-in-Frequency (DIF) Radix-2 FFT 


The radix-2 decimation-in-frequency and decimation-in-time fast Fourier 
transforms (FFTs) are the simplest FFT algorithms. Like all FFTs, they 
compute the discrete Fourier transform (DFT) 

Equation: 


X(k) = YONG a(nje") 
= Dao e(n) Wy 


where for notational convenience wk =e (F) FET algorithms gain 
their speed by reusing the results of smaller, intermediate computations to 
compute multiple DFT frequency outputs. 


Decimation in frequency 


The radix-2 decimation-in-frequency algorithm rearranges the discrete 
Fourier transform (DFT) equation into two parts: computation of the even- 
numbered discrete-frequency indices X(k) for k = [0,2,4,..., N — 2] (or 
X (2r) as in [link]) and computation of the odd-numbered indices 

k = [1,3,5,..., N — 1] (or X(2r + 1) as in [link]) 

Equation: 


X(2r) = YN) e(n)w 2” 


n=0 
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Equation: 


X(2r+1) = na a(n)wertdr 
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The mathematical simplifications in [link] and [link] reveal that both the 
even-indexed and odd-indexed frequency outputs X(k) can each be 
computed by a length- = DFT. The inputs to these DFTs are sums or 
differences of the first and second halves of the input signal, respectively, 
where the input to the short DFT producing the odd-indexed frequencies is 
multiplied by a so-called twiddle factor term wk — e- (iF), This is 
called a decimation in frequency because the frequency samples are 
computed separately in alternating groups, and a radix-2 algorithm because 
there are two groups. [link] graphically illustrates this form of the DFT 
computation. This conversion of the full DFT into a series of shorter DFTs 
with a simple preprocessing step gives the decimation-in-frequency FFT its 
computational savings. 


Decimation in frequency of a length-V DFT into two length- > DFTs 
preceded by a preprocessing stage. 
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Whereas direct computation of all N DFT frequencies according to the 
DFT equation would require N? complex multiplies and N? — N complex 
additions (for complex-valued data), by breaking the computation into two 
short-length DFTs with some preliminary combining of the data, as 
illustrated in [link], the computational cost is now 

New Operation Counts 


° ay FN = x + z complex multiplies 
° 22 (= = 1) + N= a complex additions 


This simple manipulation has reduced the total computational cost of the 
DFT by almost a factor of two! 


The initial combining operations for both short-length DFTs involve 
parallel groups of two time samples, x(n) and x (n + +). One of these so- 


called butterfly operations is illustrated in [link]. There are + butterflies 


per stage, each requiring a complex addition and subtraction followed by 
one twiddle-factor multiplication by Wy = e~ (i™") on the lower output 
branch. 


G(i) 


length-2 DFT “twiddle factor” 


DIF butterfly: twiddle factor after length-2 DFT 


It is worthwhile to note that the initial add/subtract part of the DIF butterfly 
is actually a length-2 DFT! The theory of multi-dimensional index maps 
shows that this must be the case, and that FFTs of any factorable length may 
consist of successive stages of shorter-length FFTs with twiddle-factor 
multiplications in between. It is also worth noting that this butterfly differs 
from the decimation-in-time radix-2 butterfly in that the twiddle factor 
multiplication occurs after the combining. 


Radix-2 decimation-in-frequency algorithm 


The same radix-2 decimation in frequency can be applied recursively to the 
two length- Æ DFTs to save additional computation. When successively 
applied until the shorter and shorter DFTs reach length-2, the result is the 
radix-2 decimation-in-frequency_FFT algorithm. 
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Radix-2 decimation-in-frequency FFT for a length-8 signal 


The full radix-2 decimation-in-frequency decomposition illustrated in [link] 
requires M = log, N stages, each with + butterflies per stage. Each 
butterfly requires 1 complex multiply and 2 adds per butterfly. The total 
cost of the algorithm is thus 

Computational cost of radix-2 DIF FFT 


° + log, N complex multiplies 
e N log, N complex adds 


This is a remarkable savings over direct computation of the DFT. For 
example, a length-1024 DFT would require 1048576 complex 
multiplications and 1047552 complex additions with direct computation, 
but only 5120 complex multiplications and 10240 complex additions using 
the radix-2 FFT, a savings by a factor of 100 or more. The relative savings 
increase with longer FFT lengths, and are less for shorter lengths. Modest 


additional reductions in computation can be achieved 1 by noting that certain 


twiddle factors, namely W°,, We we we We , require no 
multiplications, or fewer real wie en het ones. By implementing 
special butterflies for these twiddle factors as discussed in FFT algorithm 
and programming tricks, the computational cost of the radix-2 decimation- 
in-frequency FFT can be reduced to 


e 2N log, N — TN + 12 real multiplies 
e 3N log, N — 3N + 4 real additions 


The decimation-in-frequency FFT is a flow-graph reversal of the 
decimation-in-time FFT: it has the same twiddle factors (in reverse pattern) 
and the same operation counts. 


Note:In a decimation-in-frequency radix-2 FFT as illustrated in [link], the 
output is in bit-reversed order (hence "decimation-in-frequency"). That is, 
if the frequency-sample index n is written as a binary number, the order is 
that binary number reversed. The bit-reversal process is illustrated here. 


It is important to note that, if the input data are in order before beginning 
the FFT computations, the outputs of each butterfly throughout the 
computation can be placed in the same memory locations from which the 
inputs were fetched, resulting in an in-place algorithm that requires no 
extra memory to perform the FFT. Most FFT implementations are in-place, 
and overwrite the input data with the intermediate values and finally the 
output. 


Alternate FFT Structures 


Bit-reversing the input in decimation-in-time (DIT) FFTs or the output in 
decimation-in-frequency (DIF) FFTs can sometimes be inconvenient or 
inefficient. For such situations, alternate FFT structures have been 
developed. Such structures involve the same mathematical computations as 
the standard algorithms, but alter the memory locations in which 
intermediate values are stored or the order of computation of the FFT 


butterflies. 


The structure in [link] computes a decimation-in-frequency FFT, but 
remaps the memory usage so that the input is bit-reversed, and the output is 
in-order as in the conventional decimation-in-time FFT. This alternate 
structure is still considered a DIF FFT because the twiddle factors are 
applied as in the DIF FFT. This structure is useful if for some reason the 
DIF butterfly is preferred but it is easier to bit-reverse the input. 


Decimation-in-frequency radix-2 FFT with bit-reversed input. This is 


an in-place algorithm in which the same memory can be reused 
throughout the computation. 
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There is a similar structure for the decimation-in-time FFT with in-order 
inputs and bit-reversed frequencies. This structure can be useful for fast 
convolution on machines that favor decimation-in-time algorithms because 
the filter can be stored in bit-reverse order, and then the inverse FFT returns 
an in-order result without ever bit-reversing any data. As discussed in 
Efficient FFT Programming Tricks, this may save several percent of the 
execution time. 


The structure in [link] implements a decimation-in-frequency FFT that has 
both input and output in order. It thus avoids the need for bit-reversing 
altogether. Unfortunately, it destroys the in-place structure somewhat, 
making an FFT program more complicated and requiring more memory; on 
most machines the resulting cost exceeds the benefits. This structure can be 
computed in place if two butterflies are computed simultaneously. 
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Decimation-in-frequency radix-2 FFT with in-order input and output. 
It can be computed in-place if two butterflies are computed 
simultaneously. 


The structure in [link] has a constant geometry; the connections between 
memory locations are identical in each FFT stage. Since it is not in-place 
and requires bit-reversal, it is inconvenient for software implementation, but 
can be attractive for a highly parallel hardware implementation because the 
connections between stages can be hardwired. An analogous structure exists 
that has bit-reversed inputs and in-order outputs. 
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This constant-geometry structure has the same interconnect pattern 
from stage to stage. This structure is sometimes useful for special 
hardware. 


Radix-4 FFT Algorithms 


reusing the results of smaller, intermediate computations to compute multiple DFT frequency outputs. The radix-4 
decimation-in-time algorithm rearranges the discrete Fourier transform (DFT) equation into four parts: sums over 
all groups of every fourth discrete-time index n = [0,4,8,...,. N — 4], n = [1,5,9,...,N — 3], 

n = [2,6,10,...,N — 2] and n = [3,7,11,...,.N — 1] as in [link]. (This works out only when the FFT length is 
a multiple of four.) Just as in the radix-2 decimation-in-time FFT, further mathematical manipulation shows that 
the length- DFT can be computed as the sum of the outputs of four length- Æ DFTs, of the even-indexed and 
odd-indexed discrete-time samples, respectively, where three of them are multiplied by so-called twiddle factors 
WẸ = e (UN), W*, and WBE., 

Equation: 


N a(n) #9 


: 27x (4n)k 
N 


F : 2n(4n+1)k . 2r(4n+2)k 


Piai Ta ae FP) 4 Fst a A) S 
DFT x [2(4n)] + WẸ DFT w [z(4n + 1)] + W2: DFT x [z(4n + 2)] + W8? DFT y [x(4n + 3)] 


This is called a decimation in time because the time samples are rearranged in alternating groups, and a radix-4 
algorithm because there are four groups. [link] graphically illustrates this form of the DFT computation. 
Radix-4 DIT structure 
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Decimation in time of a length-N DFT into four length- Æ DFTs 
followed by a combining stage. 


Due to the periodicity with — of the short-length DFTs, their outputs for frequency-sample k are reused to 
compute X(k), X (k + wy, X (k + x), and X (k + aN). It is this reuse that gives the radix-4 FFT its efficiency. 
The computations involved with each group of four frequency samples constitute the radix-4 butterfly, which is 
shown in [link]. Through further rearrangement, it can be shown that this computation can be simplified to three 
twiddle-factor multiplies and a length-4 DFT! The theory of multi-dimensional index maps shows that this must be 
the case, and that FFTs of any factorable length may consist of successive stages of shorter-length FFTs with 
twiddle-factor multiplications in between. The length-4 DFT requires no multiplies and only eight complex 
additions (this efficient computation can be derived using a radix-2 FFT). 
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The radix-4 DIT butterfly can be simplified to a length-4 DFT preceded by three 
twiddle-factor multiplies. 


If the FFT length N = 4™, the shorter-length DFTs can be further decomposed recursively in the same manner to 
produce the full radix-4 decimation-in-time FFT. As in the radix-2 decimation-in-time FFT, each stage of 
decomposition creates additional savings in computation. To determine the total computational cost of the radix-4 
FFT, note that there are M = log, N = loga N stages, each with =- butterflies per stage. Each radix-4 butterfly 
requires 3 complex multiplies and 8 complex additions. The total cost is then 


Radix-4 FFT Operation Counts 


e 3 x Pen = 2N log, N complex multiplies (75% of a radix-2 FFT) 


e 8 N — = N log, N complex adds (same as a radix-2 FFT) 


The radix-4 FFT requires only 75% as many complex multiplies as the radix-2 FFTs, although it uses the same 
number of complex additions. These additional savings make it a widely-used FFT algorithm. 


The decimation-in-time operation regroups the input samples at each successive stage of decomposition, resulting 
in a "digit-reversed" input order. That is, if the time-sample index n is written as a base-4 number, the order is that 
base-4 number reversed. The digit-reversal process is illustrated for a length-N = 64 example below. 


Example: 
N= 64= 443 


Original Number Original Digit Order Reversed Digit Order Digit-Reversed Number 


0 000 000 0 
1 001 100 16 
2 002 200 32 
3 003 300 48 
4 010 010 4 
5 011 110 20 


It is important to note that, if the input signal data are placed in digit-reversed order before beginning the FFT 
computations, the outputs of each butterfly throughout the computation can be placed in the same memory 
locations from which the inputs were fetched, resulting in an in-place algorithm that requires no extra memory to 
perform the FFT. Most FFT implementations are in-place, and overwrite the input data with the intermediate 
values and finally the output. A slight rearrangement within the radix-4 FFT introduced by Burrus allows the 
inputs to be arranged in bit-reversed rather than digit-reversed order. 


A radix-4 decimation-in-frequency FFT can be derived similarly to the radix-2 DIF FFT, by separately computing 
all four groups of every fourth output frequency sample. The DIF radix-4 FFT is a flow-graph reversal of the DIT 
radix-4 FFT, with the same operation counts and twiddle factors in the reversed order. The output ends up in digit- 
reversed order for an in-place DIF algorithm. 

Exercise: 


Problem: How do we derive a radix-4 algorithm when N = 4“2? 
Solution: 


Perform a radix-2 decomposition for one stage, then radix-4 decompositions of all subsequent shorter-length 
DFTs. 


Split-radix FFT Algorithms 


The split-radix algorithm, first clearly described and named by Duhamel and Hollman in 1984, required 
fewer total multiply and add operations operations than any previous power-of-two algorithm. (Yavne 
first derived essentially the same algorithm in 1968, but the description was so atypical that the work was 
largely neglected.) For a time many FFT experts thought it to be optimal in terms of total complexity, but 
even more efficient variations have more recently been discovered by Johnson and Frigo. 


The split-radix algorithm can be derived by careful examination of the radix-2 and radix-4 flowgraphs as 
in Figure 1 below. While in most places the radix-4 algorithm has fewer nontrivial twiddle factors, in 
some places the radix-2 actually lacks twiddle factors present in the radix-4 structure or those twiddle 
factors simplify to multiplication by —2, which actually requires only additions. By mixing radix-2 and 


radix-4 computations appropriately, an algorithm of lower complexity than either can be derived. 
Motivation for split-radix algorithm 
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See Decimation-in-Time (DIT) Radix-2 FFT and Radix-4 
FFT Algorithms for more information on these algorithms. 


An alternative derivation notes that radix-2 butterflies of the form shown in Figure 2 can merge twiddle 
factors from two successive stages to eliminate one-third of them; hence, the split-radix algorithm 
requires only about two-thirds as many multiplications as a radix-2 FFT. 


Note that these two butterflies are equivalent 


The split-radix algorithm can also be derived by mixing the radix-2 and radix-4 decompositions. 
Equation: 


DIT Split-radix derivation 


. 27x (2n)k . 2n(4n+1)k N ij 2n(4n+3)k ) 
4 


ar x(2n)e (i N ) + pee z(4n + 1)e (i N ) + Da z(4n + 3Je N 
DFT x [x(2n)] + WÅDFTu (x(4n + 1)) + WDFT s (a(4n + 3)) 


X(k) 


Figure 3 illustrates the resulting split-radix butterfly. 
Decimation-in-Time Split-Radix Butterfly 


wW 
w* 


The split-radix butterfly mixes radix-2 and 
radix-4 decompositions and is L-shaped 


Further decomposition of the half- and quarter-length DFTs yields the full split-radix algorithm. The mix 
of different-length FFTs in different parts of the flowgraph results in a somewhat irregular algorithm; 
Sorensen et al. show how to adjust the computation such that the data retains the simpler radix-2 bit- 
reverse order. A decimation-in-frequency split-radix FFT can be derived analogously. 


The split-radix transform has L- 
shaped butterflies 


The multiplicative complexity of the split-radix algorithm is only about two-thirds that of the radix-2 
FFT, and is better than the radix-4 FFT or any higher power-of-two radix as well. The additions within 
the complex twiddle-factor multiplies are similarly reduced, but since the underlying butterfly tree 
remains the same in all power-of-two algorithms, the butterfly additions remain the same and the overall 
reduction in additions is much less. 


Complex 

M/As Real M/As (4/2) Real M/As (3/3) 
Multiplies O[%logyN]  4Nlog,N-33N+6+2(-1)™  Nlog,N-3N+4 
Additions  O[N log, N] &Nlog,N—18N+4+24+2(-1)"  3Nlog,N-3N+4 


Operation Counts 
Comments 


e The split-radix algorithm has a somewhat irregular structure. Successful progams have been written 
(Sorensen) for uni-processor machines, but it may be difficult to efficiently code the split-radix 
algorithm for vector or multi-processor machines. 

e G. Bruun's algorithm requires only N — 2 more operations than the split-radix algorithm and has a 
regular structure, so it might be better for multi-processor or special-purpose hardware. 

e The execution time of FFT programs generally depends more on compiler- or hardware-friendly 
software design than on the exact computational complexity. See Efficient FFT Algorithm and 
Programming Tricks for further pointers and links to good code. 


Efficient FFT Algorithm and Programming Tricks 


The use of FFT algorithms such as the radix-2 decimation-in-time or 
decimation-in-frequency methods result in tremendous savings in 
computations when computing the discrete Fourier transform. While most 
of the speed-up of FFTs comes from this, careful implementation can 
provide additional savings ranging from a few percent to several-fold 
increases in program speed. 


Precompute twiddle factors 


The twiddle factor, or wk = eT), terms that multiply the intermediate 
data in the EFT algorithms consist of cosines and sines that each take the 
equivalent of several multiplies to compute. However, at most N unique 
twiddle factors can appear in any FFT or DFT algorithm. (For example, in 
the radix-2 decimation-in-time FFT, only — twiddle factors 

Vi,k= {0, ae — — 1} : (Ww*) are used.) These twiddle factors 
can be precomputed once and stored in an array in computer memory, and 
accessed in the FFT algorithm by table lookup. This simple technique 
yields very substantial savings and is almost always used in practice. 


Compiler-friendly programming 


On most computers, only some of the total computation time of an FFT is 
spent performing the FFT butterfly computations; determining indices, 
loading and storing data, computing loop parameters and other operations 
consume the majority of cycles. Careful programming that allows the 
compiler to generate efficient code can make a several-fold improvement in 
the run-time of an FFT. The best choice of radix in terms of program speed 
may depend more on characteristics of the hardware (such as the number of 
CPU registers) or compiler than on the exact number of computations. Very 
often the manufacturer's library codes are carefully crafted by experts who 
know intimately both the hardware and compiler architecture and how to 
get the most performance out of them, so use of well-written FFT libraries 
is generally recommended. Certain freely available programs and libraries 
are also very good. Perhaps the best current general-purpose library is the 


FETW package; information can be found at http://www.fftw.org. A paper 
by Frigo and Johnson describes many of the key issues in developing 
compiler-friendly code. 


Program in assembly language 


While compilers continue to improve, FFT programs written directly in the 
assembly language of a specific machine are often several times faster than 
the best compiled code. This is particularly true for DSP microprocessors, 
which have special instructions for accelerating FFTs that compilers don't 
use. (I have myself seen differences of up to 26 to 1 in favor of assembly!) 
Very often, FFTs in the manufacturer's or high-performance third-party 
libraries are hand-coded in assembly. For DSP microprocessors, the codes 
developed by Meyer, Schuessler, and Schwarz are perhaps the best ever 
developed; while the particular processors are now obsolete, the techniques 
remain equally relevant today. Most DSP processors provide special 
instructions and a hardware design favoring the radix-2 decimation-in-time 
algorithm, which is thus generally fastest on these machines. 


Special hardware 


Some processors have special hardware accelerators or cO-processors 
specifically designed to accelerate FFT computations. For example, AMI 
Semiconductor's ‘Toccata ultra-low-power DSP microprocessor family, 
which is widely used in digital hearing aids, have on-chip FFT accelerators; 
it is always faster and more power-efficient to use such accelerators and 
whatever radix they prefer. 


In a surprising number of applications, almost all of the computations are 
FFTs. A number of special-purpose chips are designed to specifically 
compute FFTs, and are used in specialized high-performance applications 
such as radar systems. Other systems, such as OFDM-based 
communications receivers, have special FFT hardware built into the digital 
receiver circuit. Such hardware can run many times faster, with much less 
power consumption, than FFT programs on general-purpose processors. 


Effective memory management 


Cache misses or excessive data movement between registers and memory 
can greatly slow down an FFT computation. Efficient programs such as the 
FFTW package are carefully designed to minimize these inefficiences. In- 
place algorithms reuse the data memory throughout the transform, which 
can reduce cache misses for longer lengths. 


Real-valued FFTs 


FFTs of real-valued signals require only half as many computations as with 
complex-valued data. There are several methods for reducing the 
computation, which are described in more detail in Sorensen et al. 


with one FFT program 

2. Perform one stage of the radix-2 decimation-in-time decomposition 
and compute the two length-~ DFTs using the above approach. 

3. Use a direct real-valued FFT algorithm; see H.V. Sorensen et.al. 


Special cases 


Occasionally only certain DFT frequencies are needed, the input signal 
values are mostly zero, the signal is real-valued (as discussed above), or 
other special conditions exist for which faster algorithms can be developed. 
Sorensen and Burrus describe slightly faster algorithms for pruned or zero- 
padded data. Goertzel's algorithm is useful when only a few DFT outputs 
are needed. The running FFT can be faster when DFTs of highly overlapped 
blocks of data are needed, as in a spectrogram. 


Higher-radix algorithms 


Higher-radix algorithms, such as the radix-4, radix-8, or split-radix FFTs, 
require fewer computations and can produce modest but worthwhile 
savings. Even the split-radix FFT reduces the multiplications by only 33% 
and the additions by a much lesser amount relative to the radix-2 FFTs; 


significant improvements in program speed are often due to implicit loop- 
unrolling or other compiler benefits than from the computational reduction 
itself! 


Fast bit-reversal 


Bit-reversing the input or output data can consume several percent of the 
total run-time of an FFT program. Several fast bit-reversal algorithms have 
been developed that can reduce this to two percent or less, including the 
method published by D.M.W. Evans. 


Trade additions for multiplications 


When FFTs first became widely used, hardware multipliers were relatively 
rare on digital computers, and multiplications generally required many 
more cycles than additions. Methods to reduce multiplications, even at the 
expense of a substantial increase in additions, were often beneficial. The 
prime factor algorithms and the Winograd Fourier transform algorithms, 
which required fewer multiplies and considerably more additions than the 
power-of-two-length algorithms, were developed during this period. 
Current processors generally have high-speed pipelined hardware 
multipliers, so trading multiplies for additions is often no longer beneficial. 
In particular, most machines now support single-cycle multiply-accumulate 
(MAC) operations, so balancing the number of multiplies and adds and 
combining them into single-cycle MACs generally results in the fastest 
code. Thus, the prime-factor and Winograd FFTs are rarely used today 
unless the application requires FFTs of a specific length. 


It is possible to implement a complex multiply with 3 real multiplies and 5 
real adds rather than the usual 4 real multiplies and 2 real adds: 


(C+iS)(X +iY) =CxX — SY +i(CY + SX) 
but alernatively 


Z=C(X-—Y) 


D=C a9 
E=C=S 
CX SY = EY +Z 
CY +S8SX=DX-Z 


In an FFT, D and E come entirely from the twiddle factors, so they can be 
precomputed and stored in a look-up table. This reduces the cost of the 
complex twiddle-factor multiply to 3 real multiplies and 3 real adds, or one 
less and one more, respectively, than the conventional 4/2 computation. 


Special butterflies 


N N N 3N 
Certain twiddle factors, namely WY, = 1, WR , Wx, W¥ , WN , etc., 
can be implemented with no additional operations, or with fewer real 
operations than a general complex multiply. Programs that specially 
implement such butterflies in the most efficient manner throughout the 
algorithm can reduce the computational cost by up to several N multiplies 
and additions in a length-N FFT. 


Practical Perspective 


When optimizing FFTs for speed, it can be important to maintain 
perspective on the benefits that can be expected from any given 
optimization. The following list categorizes the various techniques by 
potential benefit; these will be somewhat situation- and machine-dependent, 
but clearly one should begin with the most significant and put the most 
effort where the pay-off is likely to be largest. 

Methods to speed up computation of DFTs 


e Tremendous Savings 


1. FFT Gra savings) 


e Substantial Savings (2) 


1. Table lookup of cosine/sine 

2. Compiler tricks/good programming 

3. Assembly-language programming 

4. Special-purpose hardware 

5. Real-data FFT for real data (factor of 2) 
6. Special cases 


e Minor Savings 


. radix-4, split-radix (-10% - +30%) 

. special butterflies 

. 3-real-multiplication complex multiply 
. Fast bit-reversal (up to 6%) 


BRWN PR 


Note:On general-purpose machines, computation is only part of the total 
run time. Address generation, indexing, data shuffling, and memory access 
take up much or most of the cycles. 


Note:A well-written radix-2 program will run much faster than a poorly 
written split-radix program! 


Fast Convolution 


Fast Circular Convolution 


Since, 


N-1 
m 


` z(m)h(n — m) mod N = y(n)is equivalent toY (k) = X(k)H(k) 
=0 


y(n) can be computed as y(n) = IDF T [DFT [x(n)] DFT [h(n)]] 


Cost 
e Direct 
o N? complex multiplies. 
o N(N — 1) complex adds. 
e Via FFTs 


o 3 FFTs + N multipies. 
o N+ 3a log, N complex multiplies. 
o 3(N log, N) complex adds. 


If H(k) can be precomputed, cost is only 2 FFts + N multiplies. 


Fast Linear Convolution 


DFT produces cicular convolution. For linear convolution, we must zero- 
pad sequences so that circular wrap-around always wraps over zeros. 


h(n)... TT PP Py yd... Length - M Sequence 


x(n)... PET TET -e Length - L Sequence 


yin) ee, I ee Length -L + M - 1 Sequence 


To achieve linear convolution using fast circular convolution, we must use 
zero-padded DFTs of length N > L+M-—1 


x(m) 


| h(n-m),, 


3> 


Choose shortest convenient N (usually smallest power-of-two greater than 
or equal to L + M — 1) 


y(n) = IDFT y [DFT y [z(n)| DFT [h(n)]] 


Note:There is some inefficiency when compared to circular convolution 
due to longer zero-padded DFTs. Still, o( 


computation. 


N : : 
DN ) savings over direct 


Running Convolution 


Suppose L = oo, as in a real time filter application, or L >> M. There are 
efficient block methods for computing fast convolution. 


Overlap-Save (OLS) Method 


Note that if a length-M filter h(n) is circularly convulved with a length-N 
segment of a signal x(n), 


J M 

T 

Ny Pals hin-m),, 
t 


t 


2 


the first M — 1 samples are wrapped around and thus is incorrect. 
However, for M — 1 < n < N — 1,the convolution is linear convolution, 
so these samples are correct. Thus N — M + 1 good outputs are produced 
for each length-N circular convolution. 


The Overlap-Save Method: Break long signal into successive blocks of N 
samples, each block overlapping the previous block by M — 1 samples. 
Perform circular convolution of each block with filter h(m). Discard first 


M — 1 points in each output block, and concatenate the remaining points to 
create y(n). 


Input Signal 


Output Signal 


Computation cost for a length-N equals 2” FFT per output sample is 
(assuming precomputed H(k)) 2 FFTs and N multiplies 


2 Bog, +N _ N (log N +1) 


N-M+1 =-W_ M4] complex multiplies 


2(Nlog,N)  2Nlog, N 


= complex adds 
N-M+1 N-M+1 


Compare to M mults, M — 1 adds per output point for direct method. For a 
given M, optimal N can be determined by finding N minimizing operation 
counts. Usualy, optimal N is 4M < Nop < 8M. 


Overlap-Add (OLA) Method 


Zero-pad length-L blocks by M — 1 samples. 


Add successive blocks, overlapped by M — 1 samples, so that the tails sum 
to produce the complete linear convolution. 


Output Data 


Computational Cost: Two length N = L + M — 1 FFTs and M mults and 
M — 1 adds per L output points; essentially the sames as OLS method. 


Chirp-z Transform 
Let z* = AW~*, where A = A et, W = Wie 9o), 


We wish to compute M samples, k = [0,1,2,..., M — 1] of 


Note that 
((x - n) =n? — 2nk + k?) = (nk = (n? +k? -— (k — n)’)), So 


k2 n2 ~(k—n)? 
2 


We a(n)A "W z: W 


Thus, X (zx) can be compared by 


nz 

1. Premultiply x(n) by A”"W 7, n = [0,1,..., N — 1] to make y(n) 
(k-n)? 

2. Linearly convolve with W E 


2 
3. Post multiply by to get WF to get X(zk). 


1. and 3. require N and M operations respectively. 2. can be performed 
efficiently using fast convolution. 


el | a 
0 


| | | | | y(n) of interest 
XX 
M-1 


2 
.{-n} 
We 
XX 
N-1 


XXXX 


XXX 
0 


n2 
W ~~ is required only for — (N — 1) < n < M — 1, so this linear 
convolution can be implemented with L > N + M — 1 FFTs. 


we 
Note: Wrap W 2 around L when implementing with circular 
convolution. 


So, a weird-length DFT can be implemented relatively efficiently using 
power-of-two algorithms via the chirp-z transform. 


Also useful for "Zoom-FFTs". 


FFTs of prime length and Rader's conversion 


The power-of-two FFT algorithms, such as the radix-2 and radix-4 FFTs, and 
the common-factor and prime-factor FFTs, achieve great reductions in 
computational complexity of the DFT when the length, N, is a composite 
number. DFTs of prime length are sometimes needed, however, particularly for 
the short-length DFTs in common-factor or prime-factor algorithms. The 
methods described here, along with the composite-length algorithms, allow 
fast computation of DFTs of any length. 


There are two main ways of performing DFTs of prime length: 


1. Rader's conversion, which is most efficient, and the 
2. Chirp-z transform, which is simpler and more general. 


Oddly enough, both work by turning prime-length DFTs into convolution! The 
resulting convolutions can then be computed efficiently by either 


1. fast convolution via composite-length FFTs (simpler) or by 
2. Winograd techniques (more efficient) 


Rader's Conversion 


Rader's conversion is a one-dimensional index-mapping scheme that turns a 
length-N DFT (N prime) into a length-( N — 1) convolution and a few 
additions. Rader's conversion works only for prime-length N. 


An index map simply rearranges the order of the sum operation in the DFT 
definition. Because addition is a commutative operation, the same 
mathematical result is produced from any order, as long as all of the same 
terms are added once and only once. (This is the condition that defines an 
index map.) Unlike the multi-dimensional index maps used in deriving 
common factor and prime-factor FFTs, Rader's conversion uses a one- 
dimensional index map in a finite group of N integers: k = r™ mod N 


Fact from number theory 


If N is prime, there exists an integer "r" called a primitive root, such that the 
index map k = r” mod N, m = [0,1,2,...,.N — 2], uniquely generates all 
elements k = |1, 2, 3,..., N — 1] 


Example: 

NES E2 
2° mod 5 = 1 
O e 2 
2* mod 5 =A 
2 modo E3 


Another fact from number theory 


For N prime, the inverse of r (i.e. r7! 


r mod N = 1 is also a primitive root 
(call it r71). 


Example: 
N= p= 2r =g 


25 mods — il: 


3° mod 5 = 1 
3! mod 5 =3 
37 mod 5 = 4 


3° mod 5 = 2 


So why do we care? Because we can use these facts to turn a DFT into a 
convolution! 


Rader's Conversion 


Let 
Ymn, (m = [0,1,...,N —2]) A n € [1,2,....N—1]:(n=r-™ mod N) 
, Vpk, (p = [0,1,..., N — 2]) A ke [1,2,...,N—1]: (k =r? mod N) 
m (0) + No a(n)We* if k0 
X(k) = a(n)Wr* = ) + eae ( ) N -- 
n0 ya et) i E 
where for convenience wre = e- ("3") in the DFT equation. For k Æ 0 
Equation: 
X(r? mod N) = See a(r-™ mod N)W™" " + x(0) 
Yo a(r-™ mod N)W™ ” + z(0) 


= gæ(0)+x(r mod N)*Ww” 


where l = [0,1,..., N — 2] 


Example: 
Nor =? 3 


X(0) 0 0 0 0 0\ /z(0) 
a) 012 3 4| | a(1) 
X(2)| =]0 2 4 1 3] | 2(2) 
X(3) 03 14 2] | 2(3) 
X(4) 0 4 3 2 1) \a(4) 


X(0) 0 0 0 0 0\ /z(0) 
XD) 013 a 
x(2)| =|0 2 1 3 4| | 2(3) 
X(4) 0421 1] | 2(4) 
A D8 AO Sy O) 


where for visibility the matrix entries represent only the power, m of the 
corresponding DFT term WẸ Note that the 4-by-4 circulant matrix 


ww A N e 
Aa N e WwW 
Ne UG A 
ww e-e A N 


corresponds to a length-4 circular convolution. 


Rader's conversion turns a prime-length DFT into a few adds and a composite- 
length (N — 1) circular convolution, which can be computed efficiently using 
either 


1. fast convolution via FFT and IFFT 

2. index-mapped convolution algorithms and short Winograd convolution 
alogrithms. (Rather complicated, and trades fewer multiplies for many 
more adds, which may not be worthwile on most modern processors.) See 
R.C. Agarwal and J.W. Cooley, 


Winograd minimum-multiply convolution and DFT algorithms 


S. Winograd has proved that a length-N circular or linear convolution or DET 
requires less than 2N multiplies (for real data), or 4N real multiplies for 
complex data. (This doesn't count multiplies by rational fractions, like 3 or + 
or ce which can be computed with additions and one overall scaling factor.) 
Furthermore, Winograd showed how to construct algorithms achieving these 
counts. Winograd prime-length DFTs and convolutions have the following 
characteristics: 


1. Extremely efficient for small N (N < 20) 
2. The number of adds becomes huge for large N. 


Thus Winograd's minimum-multiply FFT's are useful only for small N. They 
are very important for Prime-Factor Algorithms, which generally use 
Winograd modules to implement the short-length DFTs. Tables giving the 
multiplies and adds necessary to compute Winograd FFTs for various lengths 
can be found in C.S. Burrus (1988). Tables and FORTRAN and TMS32010 
programs for these short-length transforms can be found in C.S. Burrus and 
T.W. Parks (1985). The theory and derivation of these algorithms is quite 
elegant but requires substantial background in number theory and abstract 
algebra. Fortunately for the practitioner, all of the short algorithms one is likely 
to need have already been derived and can simply be looked up without 
mastering the details of their derivation. 


Winograd Fourier Transform Algorithm (WFTA) 


The Winograd Fourier Transform Algorithm (WFTA) is a technique that 
recombines the short Winograd modules in a prime-factor FFT into a 
composite-N structure with fewer multiplies but more adds. While 
theoretically interesting, WFTAs are complicated and different for every 
length, and on modern processors with hardware multipliers the trade of 
multiplies for many more adds is very rarely useful in practice today. 


Choosing the Best FFT Algorithm 


Choosing an FFT length 


The most commonly used FFT algorithms by far are the power-of-two- 
length FFT algorithms. The Prime Factor Algorithm (PFA) and Winograd 
Fourier Transform Algorithm (WETA) require somewhat fewer multiplies, 
but the overall difference usually isn't sufficient to warrant the extra 
difficulty. This is particularly true now that most processors have single- 
cycle pipelined hardware multipliers, so the total operation count is more 
relevant. As can be seen from the following table, for similar lengths the 
split-radix algorithm is comparable in total operations to the Prime Factor 
Algorithm, and is considerably better than the WFTA, although the PFA and 
WTEFEA require fewer multiplications and more additions. Many processors 
now support single cycle multiply-accumulate (MAC) operations; in the 
power-of-two algorithms all multiplies can be combined with adds in 
MACs, so the number of additions is the most relevant indicator of 
computational cost. 


Mults 

FFT Multiplies + 
length (real) Adds(real) Adds 
Radix 2 1024 10248 30728 40976 
Split Radix 1024 7172 27652 34824 
Prime 1008 5804 29100 34904 


Factor Alg 


Mults 


FFT Multiplies + 
length (real) Adds(real) Adds 
Winograd 
FT Alg 1008 3548 34416 37964 


Representative FFT Operation Counts 


The Winograd Fourier Transform Algorithm is particularly difficult to 
program and is rarely used in practice. For applications in which the 
transform length is somewhat arbitrary (such as fast convolution or general 
spectrum analysis), the length is usually chosen to be a power of two. When 
a particular length is required (for example, in the USA each carrier has 
exactly 416 frequency channels in each band in the AMPS cellular 
telephone standard), a Prime Factor Algorithm for all the relatively prime 
terms is preferred, with a Common Factor Algorithm for other non-prime 
lengths. Winograd's short-length modules should be used for the prime- 
length factors that are not powers of two. The chirp z-transform offers a 
universal way to compute any length DFT (for example, Matlab reportedly 
uses this method for lengths other than a power of two), at a few times 
higher cost than that of a CFA or PFA optimized for that specific length. 
The chirp z-transform, along with Rader's conversion, assure us that 
algorithms of O(N log N) complexity exist for any DFT length N. 


Selecting a power-of-two-length algorithm 


The choice of a power-of-two algorithm may not just depend on 
computational complexity. The latest extensions of the split-radix algorithm 
offer the lowest known power-of-two FFT operation counts, but the 
10%-30% difference may not make up for other factors such as regularity of 
structure or data flow, FFT programming tricks, or special hardware 
features. For example, the decimation-in-time radix-2 FFT is the fastest 
FFT on Texas Instruments' TMS320C54x DSP microprocessors, because 
this processor family has special assembly-language instructions that 
accelerate this particular algorithm. On other hardware, radix-4 algorithms 


may be more efficient. Some devices, such as AMI Semiconductor's 
Toccata ultra-low-power DSP microprocessor family, have on-chip FFT 
accelerators; it is always faster and more power-efficient to use these 
accelerators and whatever radix they prefer. For fast convolution, the 
decimation-in-frequency algorithms may be preferred because the bit- 
reversing can be bypassed; however, most DSP microprocessors provide 
zero-overhead bit-reversed indexing hardware and prefer decimation-in- 
time algorithms, so this may not be true for such machines. Good, compiler- 
or hardware-friendly programming always matters more than modest 
differences in raw operation counts, so manufacturers’ or good third-party 
FFT libraries are often the best choice. The module FFT programming 
tricks references some good, free FFT software (including the FFTW 
package) that is carefully coded to be compiler-friendly; such codes are 
likely to be considerably faster than codes written by the casual 
programmer. 


Multi-dimensional FFTs 


Multi-dimensional FFTs pose additional possibilities and problems. The 
orthogonality and separability of multi-dimensional DFTs allows them to be 
efficiently computed by a series of one-dimensional FFTs along each 
dimension. (For example, a two-dimensional DFT can quickly be computed 
by performing FFTs of each row of the data matrix followed by FFTs of all 
columns, or vice-versa.) Vector-radix FFTs have been developed with 
higher efficiency per sample than row-column algorithms. Multi- 
dimensional datasets, however, are often large and frequently exceed the 
cache size of the processor, and excessive cache misses may increase the 
computational time greatly, thus overwhelming any minor complexity 
reduction from a vector-radix algorithm. Either vector-radix FFTs must be 
carefully programmed to match the cache limitations of a specific 
processor, or a row-column approach should be used with matrix 
transposition in between to ensure data locality for high cache utilization 
throughout the computation. 


Few time or frequency samples 


FFT algorithms gain their efficiency through intermediate computations that 
can be reused to compute many DFT frequency samples at once. Some 
applications require only a handful of frequency samples to be computed; 
when that number is of order less than O(log NV), direct computation of 
those values via Goertzel's algorithm is faster. This has the additional 
advantage that any frequency, not just the equally-spaced DFT frequency 
samples, can be selected. Sorensen and Burrus developed algorithms for 
when most input samples are zero or only a block of DFT frequencies are 
needed, but the computational cost is of the same order. 


Some applications, such as time-frequency analysis via the short-time 
Fourier transform or spectrogram, require DFTs of overlapped blocks of 
discrete-time samples. When the step-size between blocks is less than 
O(log N), the running FFT will be most efficient. (Note that any window 
must be applied via frequency-domain convolution, which is quite efficient 
for sinusoidal windows such as the Hamming window.) For step-sizes of 
O(log N) or greater, computation of the DFT of each successive block via 
an FFT is faster. 


Why Transforms? 


In the field of signal processing we frequently encounter the use of 
transforms. Transforms are named such because they take a signal and 
transform it into another signal, hopefully one which is easier to process or 
analyze than the original. Essentially, transforms are used to manipulate 
signals such that their most important characteristics are made plainly 
evident. To isolate a signal's important characteristics, however, one must 
employ a transform that is well matched to that signal. For example, the 
Fourier transform, while well matched to certain classes of signal, does not 
efficiently extract information about signals in other classes. This latter fact 
motivates our development of the wavelet transform. 


Limitations of Fourier Analysis 


Let's consider the Continuous-Time Fourier Transform (CTFT) pair: 


X(2) = ` a(t)e Ndi 


1 re ; 
x(t) = => / x(Mei* a2 


The Fourier transform pair supplies us with our notion of "frequency." In 
other words, all of our intuitions regarding the relationship between the 
time domain and the frequency domain can be traced to this particular 
transform pair. 


It will be useful to view the CTFT in terms of basis elements. The inverse 
CTFT equation above says that the time-domain signal x(t) can be 
expressed as a weighted summation of basis elements 

{bp(t), be(t)| — co < N < oo}, where b(t) = et% is the basis element 
corresponding to frequency £2. In other words, the basis elements are 
parameterized by the variable 2 that we call frequency. Finally, X(.2) 
specifies the weighting coefficient for b(t). In the case of the CTFT, the 
number of basis elements is uncountably infinite, and thus we need an 
integral to express the summation. 


The Fourier Series (FS) can be considered as a special sub-case of the 
CTFT that applies when the time-domain signal is periodic. Recall that if 
z(t) is periodic with period T, then it can be expressed as a weighted 


summation of basis elements {b;(t)}|?°_., where b(t) = ei tk, 


x(t) = y X{[kletr tk 


k=—0o 


Here the basis elements comes from a countably-infinite set, parameterized 
by the frequency index k € Z. The coefficients {X|k]}|°° _„ specify the 


strength of the corresponding basis elements within signal x(t). 


Though quite popular, Fourier analysis is not always the best tool to analyze 
a signal whose characteristics vary with time. For example, consider a 
signal composed of a periodic component plus a sharp "glitch" at time to, 
illustrated in time- and frequency-domains, [link]. 


z(t) |X(Q)| 
& 4 


` gs 
Ar H, 


% % 


Fourier analysis is successful in reducing the complicated-looking periodic 
component into a few simple parameters: the frequencies {.2,, 22} and 
their corresponding magnitudes/phases. The glitch component, described 
compactly in terms of the time-domain location tọ and amplitude, however, 
is not described efficiently in the frequency domain since it produces a wide 
spread of frequency components. Thus, neither time- nor frequency-domain 
representations alone give an efficient description of the glitched periodic 
signal: each representation distills only certain aspects of the signal. 


As another example, consider the linear chirp z(t) = sin (2t?) illustrated 
in [link]. 
a(t) 
| ir N N AM 
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Though written using the sin(-) function, the chirp is not described by a 
single Fourier frequency. We might try to be clever and write 


sin( Nt’) = sin(Nt-t) = sin(Q(t)-t) 


where it now seems that signal has an instantaneous frequency 

N(t) = Mt which grows linearly in time. But here we must be cautious! 
Our newly-defined instantaneous frequency §2(t) is not consistent with the 
Fourier notion of frequency. Recall that the CTFT says that a signal can be 
constructed as a superposition of fixed-frequency basis elements e“* with 
time support from —oo to +(co); these elements are evenly spread out over 
all time, and so there is noting instantaneous about Fourier frequency! So, 
while instantaneous frequency gives a compact description of the linear 
chirp, Fourier analysis is not capable of uncovering this simple structure. 


As a third example, consider a sinusoid of frequency {29 that is 
rectangularly windowed to extract only one period ([link]). 
z(t) |X (2) 


Instantaneous-frequency arguments would claim that 


Qo if t € window : 
YA) = f Fecr (x(t) = sin(2(t)-t)) 
where {2(t) takes on exactly two distinct "frequency" values. In contrast, 
Fourier theory says that rectangular windowing induces a frequency- 
domain spreading by a eu profile, resulting in a continuum of Fourier 
frequency components. Here again we see that Fourier analysis does not 
efficiently decompose signals whose "instantaneous frequency" varies with 
tıme. 


Time-Frequency Uncertainty Principle 


Recall that Fourier basis elements b(t) = e*®* exhibit poor time 
localization abilities - a consequence of the fact that b(t) is evenly spread 
over all t € (—oo, co). By time localization we mean the ability to clearly 
identify signal events which manifest during a short time interval, such as 
the "glitch" described in an earlier example. 


At the opposite extreme, a basis composed of shifted Dirac deltas 
b(t) = A(t — 7) would have excellent time localization but terrible 
"frequency localization," since every Dirac basis element is evenly spread 
over all Fourier frequencies 2 € [—00, oo]. This can be seen via 

P= See b (te) d t| = 1 V(2), regardless of 7. By 
frequency localization we mean the ability to clearly identify signal 
components which are concentrated at particular Fourier frequencies, such 
as sinusoids. 


These observations motivate the question: does there exist a basis that 
provides both excellent frequency localization and excellent time 
localization? The answer is "not really": there is a fundamental tradeoff 
between the time localization and frequency localization of any basis 
element. This idea is made concrete below. 


Let us consider an arbitrary waveform, or basis element, b(t). Its CTFT will 
be denoted by B( 2). Define the energy of the waveform to be F, so that 
(by Parseval's theorem) 


B= f (oo) * (|B(Q))? 4a 


On = 


Next, define the temporal and spectral centers [footnote] as 


1 iad 2 
t= 5 BO) 


1 ~ 2 
Oe zrl. A(|B(Q)|)2 a R 


and the temporal and spectral widths [footnote] as 


A= f E-t at 


An = i [ (2-2) Ba)? 4.2 


If the waveform is well-localized in time, then b(t) will be concentrated at 
the point tą and A, will be small. If the waveform is well-localized in 
frequency, then B(2) will be concentrated at the point 2, and Ag will be 
small. If the waveform is well-localized in both time and frequency, then 
A;Ag will be small. The quantity A; Ay is known as the time-bandwidth 
product. 


It may be interesting to note that both + (|b(t) |)? and z4 (|B(2)| )? are 


non-negative and integrate to one, thereby satisfying the requirements of 
probability density functions for random variables t and 2. The 
temporal/spectral centers can then be interpreted as the means (i.e., centers 
of mass) of t and Q. 

The quantities A,” and Ag? can be interpreted as the variances of t and 2, 
respectively. 


From the definitions above one can derive the fundamental properties 
below. When interpreting the properties, it helps to think of the waveform 
b(t) as a prototype that can be used to generate an entire basis set. For 
example, the Fourier basis {b(t), ba(t)| — o0 < 2 < co} canbe 
generated by frequency shifts of b(t) = 1, while the Dirac basis 

{b,(t)| — co < T < oo}b,(t) can be generated by time shifts of 

b(t) = 6(t) 


1. A; and Ag are invariant to time and frequency [footnote] shifts. 


Vto E R : (4:(b(t)) = Ar (b(t — to))) 


VQ E€ R : (Ap(B(2)) = Ag(B(2 — 2))) 


This implies that all basis elements constructed from time and/or 
frequency shifts of a prototype waveform b(t) will inherit the temporal 
and spectral widths of b(t). 

Keep in mind the fact that b(t) and B(Q) = f™ b(t)e “) d t are 
alternate descriptions of the same waveform; we could have written 
An(b(t)e**) in place of A(B(2 — 2)) above. 

. The time-bandwidth product A; 4 is invariant to time-scaling. 
[footnote] 


The above two equations imply 


Va E R : (AyAg(b(at)) = A¢Ag(d(t))) 


Observe that time-domain expansion (i.e., |a| < 1) increases the 
temporal width but decreases the spectral width, while time-domain 
contraction (i.e., |a| > 1) does the opposite. This suggests that time- 
scaling might be a useful tool for the design of a basis element with a 
particular tradeoff between time and frequency resolution. On the other 
hand, scaling cannot simultaneously increase both time and frequency 
resolution. 

The invariance property holds also for frequency scaling, as implied by 
the Fourier transform property b(at) @ +B (£ J: 


lal 


. No waveform can have time-bandwidth product less than +: 
1 
A: An > p 


This is known as the time-frequency uncertainty principle. 
. The Gaussian pulse g(t) achieves the minimum time-bandwidth 
product A;Ay = + 


G(R) =e) 


Note that this waveform is neither bandlimited nor time-limited, but 
reasonable concentrated in both domains (around the points te = 0 and 
Ne = 0). 


Properties 1 and 2 can be easily verified using the definitions above. 
Properties 3 and 4 follow from the Cauchy-Schwarz inequality. 


Since the Gaussian pulse g(t) achieves the minimum time-bandwidth 
product, it makes for a theoretically good prototype waveform. In other 
words, we might consider constructing a basis from time shifted, frequency 
shifted, time scaled, or frequency scaled versions of g(t) to give a range of 
spectral/temporal centers and spectral/temporal resolutions. Since the 
Gaussian pulse has doubly-infinite time-support, though, other windows are 
used in practice. Basis construction from a prototype waveform is the main 
concept behind Short-Time Fourier Analysis and the continuous Wavelet 
transform discussed later. 


Short-time Fourier Transform 
This module introduces short-time Fourier transform. 


We saw earlier that Fourier analysis is not well suited to describing local 
changes in "frequency content" because the frequency components defined 
by the Fourier transform have infinite (i.e., global) time support. For 
example, if we have a signal with periodic components plus a glitch at time 
to, we might want accurate knowledge of both the periodic component 


frequencies and the glitch time ([link]). 
x(t) |X(Q)| 
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The Short-Time Fourier Transform (STFT) provides a means of joint time- 
frequency analysis. The STFT pair can be written 


CO 


Kota I x(t)w(t — r)e-@) at 


—CoO 


1 CO CO : 
x(t) = on i I Xstrr(Q, T)w(t = Tje add? 


assuming real-valued w(t) for which f (|w(t)|)* d t = 1. The STFT can 
be interpreted as a "sliding window CTFT": to calculate Xgrprr({, T), slide 
the center of window w(t) to time 7, window the input signal, and compute 
the CTFT of the result ((link]). 

"Sliding Window CTFT" 


f(t) Q (t-t) 


The idea is to isolate the signal in the vicinity of time 7, then perform a 
CTFT analysis in order to estimate the "local" frequency content at time T. 


Essentially, the STFT uses the basis elements 
bolt) = w(t — r)” 


over the range t € (—oo, oo) and 92 € (—oo, co). This can be understood 
as time and frequency shifts of the window function w(t). The STFT basis 
is often illustrated by a tiling of the time-frequency plane, where each tile 


represents a particular basis element ([link]): 
Q At 


=" 
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The height and width of a tile represent the spectral and temporal widths of 
the basis element, respectively, and the position of a tile represents the 
spectral and temporal centers of the basis element. Note that, while the 
tiling diagram suggests that the STFT uses a discrete set of time/frequency 


shifts, the STFT basis is really constructed from a continuum of 
time/frequency shifts. 


Note that we can decrease spectral width Ag at the cost of increased 
temporal width A; by stretching basis waveforms in time, although the 
time-bandwidth product A;\Q (i.e., the area of each tile) will remain 
constant ([link]). 


‘Zi WII o 


Our observations can be summarized as follows: 


e the time resolutions and frequency resolutions of every STFT basis 


element will equal those of the window w(t). (All STFT tiles have the 


same shape.) 


e the use of a wide window will give good frequency resolution but poor 
time resolution, while the use of a narrow window will give good time 
resolution but poor frequency resolution. (When tiles are stretched in 


one direction they shrink in the other.) 


e The combined time-frequency resolution of the basis, proportional to 


1 
AAs? 
2 
all shapes, the Gaussian [footnote] w(t) = ee l ) gives the 


highest time-frequency resolution, although its infinite time-support 
makes it impossible to implement. (The Gaussian window results in 
tiles with minimum area.) 

The STFT using a Gaussian window is known as the Gabor 
Transform (1946). 


is determined not by window width but by window shape. Of 


Finally, it is interesting to note that the STFT implies a particular definition 
of instantaneous frequency. Consider the linear chirp z(t) = sin (Qot”). 


From casual observation, we might expect an instantaneous frequency of 
NoT at time 7 since 


Vt = 7 : (sin(Qt”) = sin(Qrt)) 


The STFT, however, will indicate a time-7 instantaneous frequency of 


£ (Qot”) = 2N0T 


t=T 


Note:The phase-derivative interpretation of instantaneous frequency only 
makes sense for signals containing exactly one sinusoid, though! In 
summary, always remember that the traditional notion of "frequency" 
applies only to the CTFT; we must be very careful when bending the 


notion to include, e.g., "instantaneous frequency", as the results may be 
unexpected! 


Continuous Wavelet Transform 
This module introduces continuous wavelet transform. 


The STFT provided a means of (joint) time-frequency analysis with the 
property that spectral/temporal widths (or resolutions) were the same for all 
basis elements. Let's now take a closer look at the implications of uniform 
resolution. 


Consider two signals composed of sinusoids with frequency 1 Hz and 1.001 
Hz, respectively. It may be difficult to distinguish between these two signals 
in the presence of background noise unless many cycles are observed, 
implying the need for a many-second observation. Now consider two 
signals with pure frequencies of 1000 Hz and 1001 Hz-again, a 0.1% 
difference. Here it should be possible to distinguish the two signals in an 
interval of much less than one second. In other words, good frequency 
resolution requires longer observation times as frequency decreases. Thus, 
it might be more convenient to construct a basis whose elements have larger 
temporal width at low frequencies. 


The previous example motivates a multi-resolution time-frequency tiling of 
the form ([link]): 
|. 
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The Continuous Wavelet Transform (CWT) accomplishes the above multi- 
resolution tiling by time-scaling and time-shifting a prototype function Y(t) 
, often called the mother wavelet. The a-scaled and 7-shifted basis 
elements is given by 


where 


a\ TER 


[ wwat=o 
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The conditions above imply that (t) is bandpass and sufficiently smooth. 
Assuming that || y(t) ||= 1, the definition above ensures that 
I| Ya,z(t) ||= 1 for all a and 7. The CWT is then defined by the transform 


pair 


Xcwr(a, T) = f Htt tdt 
= 1 [(® [© Xowrla,T)pa,(t) 


In basis terms, the CWT says that a waveform can be decomposed into a 
collection of shifted and stretched versions of the mother wavelet w(t). As 
such, it is usually said that wavelets perform a "time-scale" analysis rather 
than a time-frequency analysis. 


The Morlet wavelet is a classic example of the CWT. It employs a 
windowed complex exponential as the mother wavelet: 


ene eee 
p(t) = — ze Cte 
V 27 


2 
log 2 


wavelet does not exactly satisfy the conditions established earlier, since 
(0) ~ 7x 1077 Æ 0, it can be corrected, though in practice the 
correction is negligible and usually ignored. 


where it is typical to select Qo = m . (See illustration.) While this 


(22) | 


While the CWT discussed above is an interesting theoretical and 
pedagogical tool, the discrete wavelet transform (DWT) is much more 
practical. Before shifting our focus to the DWT, we take a step back and 
review some of the basic concepts from the branch of mathematics known 
as Hilbert Space theory (Vector Space, Normed Vector Space, Inner Product 
Space, Hilbert Space, Projection Theorem). These concepts will be essential 
in our development of the DWT. 


Hilbert Space Theory 


Hilbert spaces provide the mathematical foundation for signal processing 
theory. In this section we attempt to clearly define some key Hilbert space 
orthonormal bases, and projections. The intent is not to bury you in 
mathematics, but to familiarize you with the terminology, provide intuition, 
and leave you with a "lookup table" for future reference. 


Vector Space 


e A vector space consists of the following four elements: 


1. A set of vectors V, 
2. A field of scalars (where, for our purposes, is either R or C), 
3. The operations of vector addition "+" (i.e., +: V x V > V) 


4. The operation of scalar multiplication 


"i.e," x V —> V) 


for which the following properties hold. (Assume æ A y AzeV 


anda A BE.) 


Properties 


commutativity 


associativity 


distributivity 


additive identity 
additive inverse 


multiplicative 
identity 


Examples 

e+y=ytea 
(ety)+z2=ae4+(y+2) 
(af)a = a (Ba) 

a: (æ +y) = (a: æ) + (a-y) 
(a + 8)x = az + Ba 


Væ € V : (30,0 € V : (æ +0 = æ)) 


Væ € V : (3—gæ,(—æ) € V : (x + —æ = 0)) 


VaeV:(1-x=2) 


Important examples of vector spaces include 


Properties Examples 


real N- 


VER, aR 
vectors 
complex N- V=", =C 
vectors 
sequencesin V = {æfn]| In € Z : (Zv (le[n])” < )} 
"o" , = C 
functions in " X 
gL," V= {O| [2 (FOI)? dt < co}, =C 


where we have assumed the usual definitions of addition and multiplication. 
From now on, we will denote the arbitrary vector space (V, , +, +) by the 
shorthand V and assume the usual selection of ( , +, +). We will also 
suppress the ":" in scalar multiplication, so that a - x becomes az. 


e A subspace of V is a subset M C V for which 


1.Ve,yexeM A yeEeM:((a+y) eM) 
2.vVeEeM AN ae : (are M) 


Note:Note that every subspace must contain 0, and that V is a 
subspace of itself. 


e The span of set S C V is the subspace of V containing all linear 
combinations of vectors in S. When S = {xo,...,ZN-1}, 


N-1 
span (S) = {Sa a; € 
i=0 


e A subset of linearly-independent vectors {xo,..., 2-1} C V is 
called a basis for V when its span equals V. In such a case, we say 
that V has dimension NV. We say that V is infinite-dimensional 
[footnote] if it contains an infinite number of linearly independent 
vectors. 

The definition of an infinite-dimensional basis would be complicated 
by issues relating to the convergence of infinite series. Hence we 
postpone discussion of infinite-dimensional bases until the Hilbert 
Space section. 

e V is a direct sum of two subspaces M and N, written V = M@N, 
iff every æ € V has a unique representation x = m + n form € M 
andn E€ N. 


Note:Note that this requires M N N = {0} 


Normed Vector Space 
Now we equip a vector space V with a notion of "size". 


e A norm is a function (|| - ||: V — R) such that the following 
properties hold(Va,y,wz EV A yeEV:(@EV A ye V)and 
Vaa E€ :(a€ )): 


1. || æ ||> 0 with equality iff æ = 0 
2. || aw ||= jaf- || æ || 
3. | 


, (the triangle inequality). 


In simple terms, the norm measures the size of a vector. Adding the 
norm operation to a vector space yields a normed vector space. 
Important example include: 


(£o. . £N)" |= \/ par x;2 = Val zx 


2.V= (zo. £N)" |= (læ: = Vale 
3. V = snim Eea nll)?)* 
eae F(t) |= an 


Inner Product Space 
Next we equip a normed vector space V with a notion of "direction". 


e An inner product is a function ( ((-,-) : V x V) — C) such that the 
following properties hold ( 
Vea,y,z,2EVAyEVAZEV:(@EVAYEV A ZEV) 
and Vaa E€ : (a€ )): 


yY 
,ay) = a ((æ, y)) ...implying that (aæ, y) = a ((æ, y)) 
y + z) = (@,y) + (æ, 2) 

,a) > 0 with equality iff x = 0 

In simple terms, the inner product measures the relative alignment 
between two vectors. Adding an inner product operation to a vector 
space yields an inner product space. Important examples include: 


1. V = RY, (æ, y) = Y 

2. V = CY, (æ, y) = æy 

3. V = lz, ({x|n]}, fulnl}) = eh 
4.V = Lo, (f(t), 9(t)) = SZ fg) d 


The inner products above are the "usual" choices for those spaces. 


The inner product naturally defines a norm: 
| æ |[= y (x, æ) 


though not every norm can be defined from an inner product. [footnote | 
Thus, an inner product space can be considered as a normed vector space 
with additional structure. Assume, from now on, that we adopt the inner- 
product norm when given a choice. 

An example for inner product space 4 would be any norm 


| f I= SE (FON at such that p > 2. 


e The Cauchy-Schwarz inequality says 


Kæ y| <l x ll ll yl 
with equality iff Ja € : (x = ay). 


When ((a, y)) € R, the inner product can be used to define an "angle" 
between vectors: 


(£, y) 


cos(@) = ———————_ 
= Tel lal 


e Vectors x and y are said to be orthogonal, denoted as x L y, when 
(x,y) = 0. The Pythagorean theorem says: 


ve Ly: ((le+y ll’ = (e D+ (lai) 


Vectors æ and y are said to be orthonormal when æ | y and 
| æ |=] y |= 1. 

e x | S means æ L y forall y € S. S is an orthogonal set if x L y 
foralla ^A y€ Ss.t.a2 Æ y. An orthogonal set S is an orthonormal 


set if || æ ||= 1 for all 2 € S. Some examples of orthonormal sets are 
1 0 
PRES Dop l 
0 0 


2. C” : Subsets of columns from unitary matrices 

3. lə : Subsets of shifted Kronecker delta functions 
Sc {{d[n — k]}| k € Z} 

4/252 5 = {+ f(t —nT) ne Z} for unit pulse 
f(t) = u(t) — u(t — T), unit step u(t) 


where in each case we assume the usual inner product. 


Hilbert Spaces 


Now we consider inner product spaces with nice convergence properties that allow us to define 
countably-infinite orthonormal bases. 


e A Hilbert space is a complete inner product space. A complete [footnote] space is one where 
all Cauchy sequences converge to some vector within the space. For sequence {£p} to be 
Cauchy, the distance between its elements must eventually become arbitrarily small: 


Ve,e >0: (AN. : (Yn, m, (n > Nz) A (m > N;:) : (|| £n — £m ||< €))) 


For a sequence {z,,} to be convergent to x, the distance between its elements and # must 
eventually become arbitrarily small: 


Ve, e > 0: (3N; : (Yn, n > N; : (|| £n — 2 ||< €))) 
Examples are listed below (assuming the usual inner products): 


1. V =R" 
2. V = CN 
3. V = l (i.e., square summable sequences) 
4. V = &, (i.e., square integrable functions) 


The rational numbers provide an example of an incomplete set. We know that it is possible to 
construct a sequence of rational numbers which approximate an irrational number arbitrarily 
closely. It is easy to see that such a sequence will be Cauchy. However, the sequence will not 
converge to any rational number, and so the rationals cannot be complete. 

e We will always deal with separable Hilbert spaces, which are those that have a countable 
[footnote] orthonormal (ON) basis. A countable orthonormal basis for V is a countable 
orthonormal set S = {£x} such that every vector in V can be represented as a linear 
combination of elements in S: 


VyyeV: (sto ; (» = by 3) 
k 


Due to the orthonormality of S, the basis coefficients are given by 


We can see this via: 


n n n 
(£k, Y) = (ssn Sais) = limit (2n 3ra) = limit 2a (Le, Li)) = ag 


where ô|k — i] = (£k, £i) (where the second equality invokes the continuity of the inner 
product). In finite n-dimensional spaces (e.g., R” or C”), any n-element ON set constitutes an 
ON basis. In infinite-dimensional spaces, we have the following equivalences: 


1. {£0, £1, £2,...} is an ON basis 


2. If (xi, y) = 0 for all i, then y = 0 


3.¥y,y € V : (I y I)? =X, (\(wisy)1)”) @arseval’s theorem) 
4. Every y € V is a limit of a sequence of vectors in span ({z£0, £1, £2,...}) 


Examples of countable ON bases for various Hilbert spaces include: 


1. R”: {e0,... en-1} fore; = (0 ... 0 1 0 ... 0)” with "1" in the i*® position 
2. C”: same as R” 

3. l2: {{6;[n]}| i € Z}, for {6;[n]} = {6[n — 2]} (all shifts of the Kronecker sequence) 
4, L>: to be constructed using wavelets ... 


A countable set is a set with at most a countably-infinite number of elements. Finite sets are 
countable, as are any sets whose elements can be organized into an infinite list. Continuums 
(e.g., intervals of R) are uncountably infinite. 

e Say S is a subspace of Hilbert space V. The orthogonal complement of S in V, denoted S+, is 
the subspace defined by the set {a € V| a L S}. When S is closed, we can write 
V=S@S- 

e The orthogonal projection of y onto S, where S is a closed subspace of V, is 


=Y ((wi,y))2: 


s.t. {x;} is an ON basis for S. Orthogonal projection yields the best approximation of y in S: 
ĝ =argmin|| y — æ || 
es 


The approximation error e = y — ¥ obeys the orthogonality principle: 
els 


We illustrate this concept using V = R? ([link]) but stress that the same geometrical 
interpretation applies to any Hilbert space. 


A proof of the orthogonality principle is: 
e L S & Vi: ((e, xi) =0) 
(y = ĵ, £i) =0 


Equation: 


(y, zi) z 


Discrete Wavelet Transform: Main Concepts 


Main Concepts 


The discrete wavelet transform (DWT) is a representation of a signal 

x(t) € A using an orthonormal basis consisting of a countably-infinite set 
of wavelets. Denoting the wavelet basis as {Yk n(t)]|k EZ A n € Zy}, the 
DWT transform pair is 

Equation: 


x(t) = 5 5 din Wkn(t) 
k 


=—00 N=— OO 
Equation: 


dkn = (Wen(t), e(t)) 
Ea Wrn(t)z(t) dt 


where {dj,,,} are the wavelet coefficients. Note the relationship to Fourier 
series and to the sampling theorem: in both cases we can perfectly describe 
a continuous-time signal x(t) using a countably-infinite (i.e., discrete) set 
of coefficients. Specifically, Fourier series enabled us to describe periodic 
signals using Fourier coefficients { X[k]| k € Z}, while the sampling 
theorem enabled us to describe bandlimited signals using signal samples 
{z[{n]|n € Z}. In both cases, signals within a limited class are represented 
using a coefficient set with a single countable index. The DWT can describe 
any signal in“ using a coefficient set parameterized by two countable 
indices: {dkn|k EZ A ne Z}. 


Wavelets are orthonormal functions in obtained by shifting and 


stretching a mother wavelet, Y(t) € %2. For example, 
Equation: 


Vk,nk A neZ: (Yen(t) = 2-39 (2-*t — n)) 


defines a family of wavelets {Yk n(t)|k € Z A n € Z} related by power- 
of-two stretches. As k increases, the wavelet stretches by a factor of two; as 
n increases, the wavelet shifts right. 


Note: When || (t) ||= 1, the normalization ensures that || Yk,n(t) ||= 1 
forall k € Z,n E€ Z. 


Power-of-two stretching is a convenient, though somewhat arbitrary, 
choice. In our treatment of the discrete wavelet transform, however, we will 
focus on this choice. Even with power-of two stretches, there are various 
possibilities for Y(t), each giving a different flavor of DWT. 


Wavelets are constructed so that {Yk n(t)| n € Z} (i.e., the set of all shifted 
wavelets at fixed scale k), describes a particular level of 'detail' in the 
signal. As k becomes smaller (i.e., closer to —oo), the wavelets become 
more "fine grained" and the level of detail increases. In this way, the DWT 
can give a multi-resolution description of a signal, very useful in analyzing 
"real-world" signals. Essentially, the DWT gives us a discrete multi- 
resolution description of a continuous-time signal in 7%. 


In the modules that follow, these DWT concepts will be developed "from 
scratch" using Hilbert space principles. To aid the development, we make 
use of the so-called scaling function y(t) € 2, which will be used to 
approximate the signal up to a particular level of detail. Like with 
wavelets, a family of scaling functions can be constructed via shifts and 
power-of-two stretches 

Equation: 


Vkn,k AneZ: (oralt) = 2-Fy(2-*¢ — n)) 


given mother scaling function y(t). The relationships between wavelets and 
scaling functions will be elaborated upon later via theory and example. 


Note: The inner-product expression for dkn, [link] is written for the 
general complex-valued case. In our treatment of the discrete wavelet 
transform, however, we will assume real-valued signals and wavelets. For 


this reason, we omit the complex conjugations in the remainder of our 
DWT discussions 


The Haar System as an Example of DWT 


The Haar basis is perhaps the simplest example of a DWT basis, and we 
will frequently refer to it in our DWT development. Keep in mind, however, 
that the Haar basis is only an example; there are many other ways of 
constructing a DWT decomposition. 


For the Haar case, the mother scaling function is defined by [link] and 
[link]. 


Equation: 
ine 1if0<t<l 
PY) = 0 otherwise 
olt) 
0 pe 


From the mother scaling function, we define a family of shifted and 
stretched scaling functions {yzn(t) } according to [link] and [link] 
Equation: 


Pknlt) = Vk,n,keZneZ: (2-Fp(2"t — n)) 


= 2to(h (t- n2) 


em 


g-k/2 


al la 
2 Via (n+1)2* 


which are illustrated in [link] for various k and n. [link] makes clear the 
principle that incrementing n by one shifts the pulse one place to the right. 
Observe from [link] that {Yk n(t)| n € Z} is orthonormal for each k (i.e., 
along each row of figures). 


-1,o(t) 


2 


A Hierarchy of Detail in the Haar System 


Given a mother scaling function y(t) € “2 — the choice of which will be 
discussed later — let us construct scaling functions at "coarseness-level-k” 
and "shift-n" as follows: 


Pan(t) = 2-7 y(2-*t — n). 


Let us then use Vg to denote the subspace defined by linear combinations of 
scaling functions at the k*t? level: 


Ve = span ({pkn(t)|n € Z}). 


In the Haar system, for example, Vo and V, consist of signals with the 
characteristics of xg(t) and x(t) illustrated in [link]. 
a(t) 


| 
ab I 2 3 |4 (5 16 f 
m z 


a(t) 


We will be careful to choose a scaling function y(t) which ensures that the 
following nesting property is satisfied: 


 CWCVYCVWCV1CV_oC... 
coarse detailed 


In other words, any signal in V; can be constructed as a linear combination 
of more detailed signals in V,_, . (The Haar system gives proof that at 
least one such y(t) exists.) 


The nesting property can be depicted using the set-theoretic diagram, [link], 
where V_ is represented by the contents of the largest egg (which includes 
the smaller two eggs), Vo is represented by the contents of the medium- 
sized egg (which includes the smallest egg), and V; is represented by the 
contents of the smallest egg. 


n ë V- 


L 


Going further, we will assume that y(t) is designed to yield the following 
three important properties: 


1. {Pk n(t)|n € Z} constitutes an orthonormal basis for Vj, 

2. V,. = {0} (contains no signals). [footnote] 
While at first glance it might seem that Və should contain non-zero 
constant signals (e.g., x(t) = a for a € R), the only constant signal in 
S , the space of square-integrable signals, is the zero signal. 

3. V_o = & (contains all signals). 


Because {Yk n(t)| n € Z} is an orthonormal basis, the best (in 2 norm) 
approximation of z(t) € -> at coarseness-level-k is given by the 
orthogonal projection, [link] 

Equation: 


x ),(t) = ` CknPk nlt) 


Equation: 


Ckn = (Pk n(t), 2(t)) 


We will soon derive conditions on the scaling function y(t) which ensure 
that the properties above are satisfied. 


Haar Approximation at the kth Coarseness Level 


It is instructive to consider the approximation of signal z(t) € 2 at 
coarseness-level-k of the Haar system. For the Haar case, projection of 
x(t) € YL onto Vg is accomplished using the basis coefficients 
Equation: 


Chn = Joo Phn(t)a(t) dt 
= fei" o-Fa(t) dt 


n2k 


giving the approximation 
Equation: 


selt) = J noo CknPkn(t) 


n=— JngQk 


n k pL 
Te Fe IG a(t) dt (27 ynn(t)) 


where 


1 (n+1)2* 


oe x(t) dt = average value of x(t) in interval 
n2k 


Vk: (22 Prn(t) = height = 1) 
This corresponds to taking the average value of the signal in each interval 


of width 2* and approximating the function by a constant over that interval 
(see [link]). 


The Scaling Equation 


Consider the level-1 subspace and its orthonormal basis: 
Equation: 


Vi = span ({Y1n(t)|n € Z}) 


Equation: 


Since V; C Vo (i.e., Vo is more detailed than V; ) and since ~1,0(t) E€ Vo, 
there must exist coefficients {h|n]| n € Z} such that 


Equation: 
gio(t)= X` h{nlyon(t) 
Equation: 
<0( St) = $ hnel- n) 
JV n=—00 
Equation: 


Scaling Equation 


v(t) = V2 Y. hirot- n) 


n=— oOo 


To be a valid scaling function, y(t) must obey the scaling equation for 
some coefficient set {h|n] }. 


The Wavelet Scaling Equation 


The difference in detail between Vz and V,_; will be described using Wọ , 
the orthogonal complement of Vz in Vk—1: 
Equation: 


Vk-1 = Vk D Wk 


At times it will be convenient to write Wg = Vz. This concept is 


illustrated in the set-theoretic diagram, [link]. 
Ke W 


Suppose now that, for each k € Z, we construct an orthonormal basis for 
Wp, and denote it by {Yk n(t)| n € Z}. It turns out that, because every V;, 
has a basis constructed from shifts and stretches of a mother scaling 


function (i.e., Pk n(t) = 2-7 (2-*t — n), every Wx has a basis that can 
be constructed from shifts and stretches of a "mother wavelet" Y(t) € LZ: 


bin (t) = 2-7 (2-*t — n). 
The Haar system will soon provide us with a concrete example . 


Let's focus, for the moment, on the specific case k = 1. Since W1 C Vo, 
there must exist {g|n]| n € Z} such that: 
Equation: 


pı o(t) = 3 gn] Pon() 


n=— 00 


= 0(5t) = $ alnlete—m 


n=— Co 


Equation: 
Wavelet Scaling Equation 


w(t) = V2 y gin]y(2t —n) 


n=— o0 


To be a valid scaling-function/wavelet pair, y(t) and w(t) must obey the 
wavelet scaling equation for some coefficient set {g{n]}. 


Conditions on h[n] and g[n] 


Here we derive sufficient conditions on the coefficients used in the scaling equation and wavelet 
scaling equation that ensure, for every k € Z, that the sets {Yk n(t)| n € Z} and {Wen(t)|n € Z} 
have the orthonormality properties described in The Scaling Equation and The Wavelet Scaling 
Equation. 


For {pk n(t)| n € Z} to be orthonormal at all k, we certainly need orthonormality when k = 1. This 
is equivalent to 
Equation: 


ôm] = (p1,0(t), P1m(t)) 
( 


where 6[n — £ + 2m] = (p(t — n), p(t — £ — 2m)) 
Equation: 


There is an interesting frequency-domain interpretation of the previous condition. If we define 
Equation: 


h[m]*h[—m] 
Vin h[n]h[n — m] 


plm] 


then we see that our condition is equivalent to p|2m] = 6{m]. In the z-domain, this yields the pair of 
conditions 
Equation: 

Power-Symmetry Property 


P(z) = H(z2)H (z7?) 


l= 1/2) P (2e?) — Te 4 1/2P(—2"?) 


p=0 


Putting these together, 
Equation: 


2 2= (\H(e)|)° + (aer) D 


where the last property invokes the fact that h{n] € R and that real-valued impulse responses yield 
conjugate-symmetric DTFTs. Thus we find that [n] are the impulse response coefficients of a 
power-symmetric filter. Recall that this property was also shared by the analysis filters in an 
orthogonal perfect-reconstruction FIR filterbank. 


Given orthonormality at level k = 0, we have now derived a condition on h/n] which is necessary 
and sufficient for orthonormality at level k = 1. Yet the same condition is necessary and sufficient for 
orthonormality at level k = 2: 

Equation: 


6[m] , P2m(t)) 

2i O: die hle ere+om(t)) 
den hln] Xe RIE (Prnt), P1,e+2m(t))) 
n=- h{n]h[n — 2m] 


=00 


II 


Pte a 
aS) 
N 
© 
“~~ 
œ 
~ 
6 


I 


where 6[n — £ + 2m] = (yin(t), P1,2+2m(t)). Using induction, we conclude that the previous 
condition will be necessary and sufficient for orthonormality of {pp n(t)|n € Z} for all k € Z. 


To find conditions on {g|n]} ensuring that the set {pk n(t)| n € Z} is orthonormal at every k, we can 
repeat the steps above but with g[n] replacing h[n], Ykn(t) replacing Ykn(t), and the wavelet- 
scaling equation replacing the scaling equation. This yields 

Equation: 


& 2 = G(2)G (z2) + G(-2)G(-z) 


Next derive a condition which guarantees that Wg -L Vk, as required by our definition Wọ = Vey for 
all k € Z. Note that, for any k € Z, Wk L Vk is guaranteed by 

{Ykn(t)|n E€ Z} L {pkn(t)|n € Z} which is equivalent to 

Equation: 


O = (Wr+0(t), Pr+1m(t)) 

(din gin] prn (t), doe RIE] Pke+2m(t)) 
Don Iln] dig AIE] (Prnt), Pr e+2m(t))) 
yin II 


n gln]h[n — 2m] 


II 


for all m where ô|n — £ + 2m] = (Pr n(t), Pk t+2m(t)). In other words, a 2-downsampled version of 
g|n|*h|—n] must consist only of zeros. This necessary and sufficient condition can be restated in the 
frequency domain as 

Equation: 


0=1/2 S G (212e 679) H (212%) 


p=0 


The choice 
Equation: 


satisfies our condition, since 


G(z)H(z"*) + G(-z)H(-z") = (--?H((-2)") A(z) F zPa(z")H((-2)*) =0 


In the time domain, the condition on G(z) and H(z) can be expressed 
Equation: 


Vodd P : (gjn] = +(—1"A[P — n))). 


Recall that this property was satisfied by the analysis filters in an orthogonal perfect reconstruction 
FIR filterbank. 


Note that the two conditions 
Vodd P : (G(2) = +(2?H((-2)"))) 


2 = A(z)H(z") + H(-2z)A(-z"") 


are sufficient to ensure that both {Y} n(t)| n € Z} and {Yk n(t)| n € Z} are orthonormal for all k 
and that W% | Vx for all k, since they satisfy the condition 2 = G(z)G(z) + G(-z)G(-z"") 
automatically. 


Values of g[n] and h[n] for the Haar System 


The coefficients {h[n]} were originally introduced at describe y1(t) in 
terms of the basis for Vo: 


gio(t) = X` h[n|yon(t) 


From the previous equation we find that 
Equation: 


(Yo,m(t), P1,0(t)) (Yo,m(t); don Rin] Po,n(t)) 
= > nhin] ((Yom(t), Yon(t))) 


him] 


where ô[n — m] = (Yo,m(t), Po,n(t)}, which gives a way to calculate the 
coefficients {h[m]} when we know 94,n(t). 


In the Haar case 
Equation: 


him] = f° Vom(t)yro(t) dt 
jer ~10(t) d t 
[z if m € {0,1} 


0 otherwise 


since Y1,0(t) = F in the interval [0, 2) and zero otherwise. Then choosing 
P = 1 in gin] = —1”h(P — n), we find that 
a if 0 
ak Soe 
g|n mie 
0 otherwise 


for the Haar system. From the wavelet scaling equation 
W(t) = V2 X g[n]p(2t — n) = p(2t) — y(2t — 1) 


we can see that the Haar mother wavelet and scaling function look like in 
[link]: 
tolt) ¥(t) 


It is now easy to see, in the Haar case, how integer shifts of the mother 
wavelet describe the differences between signals in V_; and Vo ([link]): 


a(t) € Vi 


& 
nin 


We expect this because V_1 = Vo 6 Wo. 


Wavelets: A Countable Orthonormal Basis for the Space of Square- 
Integrable Functions 


Recall that Vg = Wii ® Vk+1 and that Vir, = Wk+2 ® Vero. Putting 
these together and extending the idea yields 
Equation: 


Ve = Wk D Wk+2 8 Vk+2 
Wr B Wk+2 @ ... O We @ Vi 
= Wk 8 Wk+2 D Wk43 È... 


- (W; 
ae, 


If we take the limit as k — —ox, we find that 


Equation: 
Zo. = Vig 
= © (Wi) 
Moreover, 
Equation: 
(Wy, L Vi) A (Wk>2 C Vi) > (Wy L Wk>2) 
Equation: 


(W2 ae V2) A (Wrsg E V2) = (W2 L Wk>3) 


from which it follows that 
Equation: 


Wg L Wye 


or, in other words, all subspaces Wx are orthogonal to one another. Since 
the functions {Yk n(t)| n € Z} form an orthonormal basis for Wx , the 
results above imply that 

Equation: 


{kn(t)|n A k E€ Z}constitutes an orthonormal basis for% 


This implies that, for any f(t) € 2, we can write 
Equation: 


f(t) = dx [m] Yk m(t) 
k -—ooM =œ 
Equation: 
dk [m] = (Vk m(t), f(t)) 


This is the key idea behind the orthogonal wavelet system that we have 
been developing! 


Filterbanks Interpretation of the Discrete Wavelet Transform 


Assume that we start with a signal x(t) € . Denote the best approximation at the 0* level of coarseness by 
x(t). (Recall that z(t) is the orthogonal projection of z(t) onto Vo .) Our goal, for the moment, is to decompose 
Z(t) into scaling coefficients and wavelet coefficients at higher levels. Since zo(t) € Vo and Vo = Vi © Wi, 
there exist coefficients {cg[n]}, {ci[n]}, and {d1[n]} such that 

Equation: 


to(t) = Yan colin] ¥on(é) 
Man Clr] ¢inlt] + Van Alri nlt] 


I 


Using the fact that {%1 n(t)| n € Z} is an orthonormal basis for Vj , in conjunction with the scaling equation, 
Equation: 


ci[n| 


| 


xolt), Yin(t)) 

D mm P 

] ((Yo,m 

] (plt — m), X u hlllylt — £- 2n))) 

m C0 cl Lew hll] (elt — m), p(t — £ — 2n))) 
] = 


m(t), Prn(t)) 
(t), Pin(t))) 


I 


( 
Zomm 
= Dnm 
dim 
a 
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where d[t — £ — 2n] = (y(t — m), p(t — £ — 2n)). The previous expression ([link]) indicates that {c; [n] } results 
from convolving {co|m]} with a time-reversed version of h|m] then downsampling by factor two ({link]). 


colm] —- G(z-*) = $2 — dı[n] 


Using the fact that {Y1 n(t)| n € Z} is an orthonormal basis for Wj , in conjunction with the wavelet scaling 
equation, 
Equation: 


dijn] = (xolt), Pin(t)) 
= (Xmm Colm] yo 
= Dan 


m(t), b1,n(€)) 
oy Pin(t))) 


—_~ i 
aS 


co[m] 
= Limm Colm] ( m), die gléle(t — £ — 2n))) 
= Dnm Colm] dee a (p(t — m), p(t — £- 2n))) 
= pares co [m]g[m E 2n] 


where é[t — £ — 2n] = (y(t — m), p(t — £ — 2n)). 


The previous expression ([link]) indicates that {d1 |n] } results from convolving {co|m]} with a time-reversed 
version of g|m] then downsampling by factor two ([link]). 


olm — Gi) k 42 di [n] 


Putting these two operations together, we arrive at what looks like the analysis portion of an FIR filterbank 


([link]): 


H(z!) | [2 |= ci [n] 
co[m] i 
G(27}) = 42 > di[n] 


We can repeat this process at the next higher level. Since Vı = W2 ® Va, there exist coefficients {c2[n]} and 
{d2[n]} such that 
Equation: 


tilt) = Yin cilr] yi n(t) 
Zinn d2[n\ban(t) + Donn C2lr|P2,n(t) 


Using the same steps as before we find that 
Equation: 


ealn] = X ex [m|h[m — 27] 
Equation: 


dəfn] = X> e[mlglm — 27] 


which gives a cascaded analysis filterbank ([link]): 


cım] H (27) |» 42 |» cafn] 


m H (27!) = |2 = G(z271) = 42 = do[n] 


old —— G(z~*) m 42 = dım] 


If we use V) = W1 @ W, © W; @ --- © W, @ Vh to repeat this process up to the kt? level, we get the iterated 
analysis filterbank ([link]). 


= of = 12 = dia) 
ealn) eo H{27") em 42 
alm) ee Hit) |e yo -| Gla") |] 42 ~ dip) 
= Hi7") je 42 ™ Gi") je 42 = dain) 


As we might expect, signal reconstruction can be accomplished using cascaded two-channel synthesis filterbanks. 
Using the same assumptions as before, we have: 
Equation: 


colm] = (xo(t), Pom(t)) 

(Man Culm] Pin (t) + Lan dlnth), Pom(t)) 

onn ciin] (Grn (Et), Pom(t))) + Lenn daln] (CHin (t), Po,m(t))) 
Ynn Cilm|alm — 2n] + Xan dilm|glm — 2n] 


I 


nn Ĉ 


where h|m — 2n] = (Y1in(t), Yom(t)) 
and gim- 2n] = (Y1 n(t), P0,m(t)) 


which can be implemented using the block diagram in [link]. 


J— t2 HG) 
]— 12 Ge) 


The same procedure can be used to derive 
Equation: 


O alm 


= X cg[njh[m — 2n] + X` do[n]g[m — 2n] 


from which we get the diagram in [link]. 


ca[n]—12 | H(z) 


cfm] 


dlnt ee Har He H 


dım] t2 Gle) HE cofe) 


To reconstruct from the k*t: level, we can use the iterated synthesis filterbank ((link]). 


esim) — 12| Hie) 


| eii) 
dsla) 12} Gis) 4) 


n =- AL 
$ ezin] 
dalp) 12 j= Gie) {Fe 12 l-e Hiz) 
4 epim] 
daln) 12l-e! Gis) j= 12| Hie) 
À 
diim) = 12 Ge) {Ae ltl 


The table makes a direct comparison between wavelets and the two-channel orthogonal PR-FIR filterbanks. 


Discrete Wavelet Transform 2-Channel Orthogonal PR-FIR Filterbank 
Analysis- z 
ie H(z 1) Ho(2) 
Power H(z)H(z"!) + H(-z)H(-2z71) = 2 Ho(z)Ho(z-1) + Ho(—z)Ho(—z7 


Symmetry 


Discrete Wavelet Transform 2-Channel Orthogonal PR-FIR Filterbank 


or | Oe") (2) 

RRA VP, P is odd : (G(z) = +(z-? H(—z7!))) VN, Nis even : (Hi(z) = +(z-8-) Hy (— 
ae H(z) Go(z) = 2270- Hy (27) 

Sa G(z) Gi(z) = 22-N-)) H; (27!) 


From the table, we see that the discrete wavelet transform that we have been developing is identical to two-channel 
orthogonal PR-FIR filterbanks in all but a couple details. 


1. Orthogonal PR-FIR filterbanks employ synthesis filters with twice the gain of the analysis filters, whereas in 
the DWT the gains are equal. 

2. Orthogonal PR-FIR filterbanks employ causal filters of length N, whereas the DWT filters are not 
constrained to be causal. 


For convenience, however, the wavelet filters H(z) and G(z) are usually chosen to be causal. For both to have 
even impulse response length N, we require that P = N — 1. 


Initialization of the Wavelet Transform 


The filterbanks developed in the module on the filterbanks interpretation of 
the DWT start with the signal representation {co|n]| n € Z} and break the 
representation down into wavelet coefficients and scaling coefficients at 
lower resolutions (i.e., higher levels k). The question remains: how do we 
get the initial coefficients {cg|n] }? 


From their definition, we see that the scaling coefficients can be written 
using a convolution: 
Equation: 


cojn] = (p(t —n), x(t) 
f> plt- n)a(t) dt 
= (—t)*2(t)| 7 


which suggests that the proper initialization of wavelet transform is 
accomplished by passing the continuous-time input x(t) through an analog 
filter with impulse response y(—t) and sampling its output at integer times 


({link]). 


z(t) woh A co[m] meg 


|H) Le! 42 42-a] 


G(z Ga) | = 42 42|- afn] 


Practically speaking, however, it is very difficult to build an analog filter 
with impulse response y(—t) for typical choices of scaling function. 


The most often-used approximation is to set co[n] = z[n]. The sampling 
nr) 


theorem implies that this would be exact if y(t) = , though clearly 
this is not correct for general y(t). Still, this a is somewhat 
justified if we adopt the view that the principle advantage of the wavelet 
transform comes from the multi-resolution capabilities implied by an 
iterated perfect-reconstruction filterbank (with good filters). 


Regularity Conditions, Compact Support, and Daubechies' Wavelets 


Here we give a quick description of what is probably the most popular 
family of filter coefficients h|n] and g/n| — those proposed by Daubechies. 


Recall the iterated synthesis filterbank. Applying the Noble identities, we 
can move the up-samplers before the filters, as illustrated in [link]. 


ad—dt H mare h 


dila] tzl Ge") He) ha 


P= calf 


dsp] — t8] G(e4)H(e2)H(z) H 


dajn] — t4 G(z?) H(z) J| 


di [rn] t2 i—i G(z} _| 


The properties of the 7-stage cascaded lowpass filter 
Equation: 


in the limit 2 — oo give an important characterization of the wavelet 
system. But how do we know that limit H ® (e™) converges to a response 
1—00 


in LY, ? In fact, there are some rather strict conditions on H (e*) that must 


be satisfied for this convergence to occur. Without such convergence, we 
might have a finite-stage perfect reconstruction filterbank, but we will not 
have a countable wavelet basis for “ . Below we present some "regularity 
conditions" on H (e™) that ensure convergence of the iterated synthesis 


lowpass filter. 


Note:The convergence of the lowpass filter implies convergence of all 
other filters in the bank. 


Let us denote the impulse response of H C) (2) by h® [n]. Writing 
H(z) = H(2?") A V(2) 
in the time domain, we have 


hO [n] = X` h[k]h& [n — 2k] 
kk 


Now define the function 
p” (t) = 27 ` h) [n] Fin /2 41/2) (t) 


where Ja b) (t) denotes the indicator function over the interval [a, b): 


_ {1 if t € [a,b) 
Fa,) (t) o p if t g la, b) 


The definition of y“) (t) implies 
Equation: 


nm ntl). RERE 
vite |=, 7 BG In] = 27y (t)) 


Equation: 


1 n . i-1 4 
Vt,t € E r) t In — 2k] = 2 T pE k) ) 


and plugging the two previous expressions into the equation for Ao [n] 
yields 
Equation: 


p(t) = V2X hiki” [at — k]. 
kk 


Thus, if yp (t) converges pointwise to a continuous function, then it must 
satisfy the scaling equation, so that limit y(t) = y(t). Daubechies 
1-00 


showed that, for pointwise convergence of vy) (t) to a continuous function 
in LY, , it is sufficient that H (e™) can be factored as 
Equation: 


vep >: (me = va(4*)’ Re) 


for R(e”) such that 
Equation: 


sup (R(e™)|) < 2° 


Here P denotes the number of zeros that H (eo) has at w = m. Such 
conditions are called regularity conditions because they ensure the 
regularity, or smoothness of y(t). In fact, if we make the previous 
condition stronger: 

Equation: 


votes 1% (sup (|R(e) |) < arat) 


then limit y(t) = y(t) for y(t) that is ¢-times continuously 


100 


differentiable. 


There is an interesting and important by-product of the preceding analysis. 

If h[n] is a causal length-N filter, it can be shown that h [n] is causal with 

length N* = 2‘ (N — 1) +1. By construction, then, ©) [¢] will be zero 

outside the interval 0, ae et . Assuming that the regularity conditions 

are satisfied so that limit y(t) = y(t), it follows that y(t) must be zero 
too 


outside the interval [0, N — 1]. In this case we say that y(t) has compact 
support. Finally, the wavelet scaling equation implies that, when y(t) is 
compactly supported on [0, N — 1] and g[n] is length N, y(t) will also be 
compactly supported on the interval [0, N — 1]. 


Daubechies constructed a family of H(z) with impulse response lengths 
N € {4,6, 8, 10,...} which satisfy the regularity conditions. Moreover, 
her filters have the maximum possible number of zeros at w = 7, and thus 
are maximally regular (i.e., they yield the smoothest possible y(t) for a 
given support interval). It turns out that these filters are the maximally flat 
filters derived by Herrmann long before filterbanks and wavelets were in 
vogue. In [link] and [link] we show y(t), (2), w(t), and ¥(2) for 
various members of the Daubechies' wavelet system. 


See Vetterli and Kovacivi¢ for a more complete discussion of these matters. 
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Computing the Scaling Function: The Cascade Algorithm 
This module shows how to compute the scaling function. It also has a 
section with a proof for an assumption made for the computation. 


Given coefficients {h[n]} that satisfy the regularity conditions, we can 
iteratively calculate samples of y(t) on a fine grid of points {t} using the 
cascade algorithm. Once we have obtained y(t), the wavelet scaling 
equation can be used to construct w(t). 


In this discussion we assume that H(z) is causal with impulse response 
length N. Recall, from our discussion of the regularity conditions, that this 
implies y(t) will have compact support on the interval [0, N — 1]. The 
cascade algorithm is described below. 


1. Consider the scaling function at integer times 
t=me {0,...,N—1}: 


N-1 


v(m) = V2 > h(n)p(2m — n) 


n=0 


Knowing that y(t) = 0 fort ¢ [0, N — 1], the previous equation can 
be written using an NxN matrix. In the case where N = 4, we have 


Equation: 

(0) ho) 0 0 0 (0) 
el) | _ | he] All] Alo] 0 y(1) 
y(2) O Al3) Al2] All] | | (2) 
y(3) 0 0 0 RAB]/ \y¥(3) 

ho) 0 0 0 

where |P PE Alo] 0 

0 Al3]} Al2] All] 

0 0 0 Af3] 


The matrix H is structured as a row-decimated convolution matrix. 
From the matrix equation above ([link]), we see that 
(y~(0)y(1)y(2)y(3))* must be (some scaled version of) the 


eck 
eigenvector of H corresponding to eigenvalue (v 2) . In general, 


the nonzero values of {y(n)|n € Z}, i.e., (p(0)y(1)...p(N —1))* 
, can be calculated by appropriately scaling the eigenvector of the Nx 
N row-decimated convolution matrix H corresponding to the 


eigenvalue (v2) . It can be shown that this eigenvector must be 


scaled so that > D y(n) =1. 
. Given {y(n)|n € Z}, we can use the scaling equation to determine 


1e(3)|n €Z}: 


Equation: 
(+) = ay h[n|y(m — n) 


This produces the 2N — 1 non-zero samples 


{2(0), (1/2), (1), (3/2), -- P(N = 1)5. 


. Given {p(4 ) | nme Z}, the scaling equation can be used to find 


{o(4)|ne 2}: 


Equation: 


S 
e|š 
| 
0 

M 
= 2 
ii 
= 
S, 
6 
~|3 
| 
Z 


= V2 pp hr2 [ply [m — p] 


where h+2[p] denotes the impulse response of H G. i.e., a 2- 
upsampled version of h[n], and where ọ 1 [m] = (= ). Note that 
{p(4)|n € Z} is the result of convolving h+2[n] with Ws i|n \\ 


. Given {y(4 )| n E€ Z}, another convolution yields {y(2 )| n € Z}: 
Equation: 


| 
wl 
wy 
Sj 
a 
R 
B, 
6 
3 
| 
3. 


where h+,4|n] is a 4-upsampled version of h|n] and where 
pilm] = (F) 
5. At the £* stage, {yp (+) \ is calculated by convolving the result of the 
£ — 1™ stage with a 24~1-upsampled version of A[n]: 
Equation: 


pı (m) = V2Ň hyu Ply- im — p] 


For £ ~ 10, this gives a very good approximation of y(t). At this point, 
you could verify the key properties of y(t), such as orthonormality and the 
satisfaction of the scaling equation. 


In [link] we show steps 1 through 5 of the cascade algorithm, as well as step 
10, using Daubechies' db2 coefficients (for which N = 4). 


draton Ò tmien 
is 


Memon a 


Finite-Length Sequences and the DWT Matrix 


The wavelet transform, viewed from a filterbank perspective, consists of 
iterated 2-channel analysis stages like the one in [link]. 


+42 | cesta 


=| 42 = des ifn) 


First consider a very long (i.e., practically infinite-length) sequence 
{cz[m]|m € Z}. For every pair of input samples {c,[2n], cz,[2n — 1]} that 
enter the k'® filterbank stage, exactly one pair of output samples 

{cz41|n], dk+1[n]} are generated. In other words, the number of output 
equals the number of input during a fixed time interval. This property is 
convenient from a real-time processing perspective. 


For a short sequence {cz[m]|m € {0,..., M — 1}}, however, linear 
convolution requires that we make an assumption about the tails of our finite- 
length sequence. One assumption could be 

Equation: 


Ym, m € {0,...,M — 1} : (c,[m] = 0) 


In this case, the linear convolution implies that M nonzero inputs yield 

ML =] outputs from each branch, for a total of 

2 (2 — 1) = M +N — 2 > M outputs. Here we have assumed that 
both H (271) and G [ie have impulse response lengths of N > 2, and that 


M and N are both even. The fact that each filterbank stage produces more 
outputs than inputs is very disadvantageous in many applications. 


A more convenient assumption regarding the tails of 

{cz[m]|m € {0,..., M — 1}} is that the data outside of the time window 
{0,..., M — 1} is a cyclic extension of data inside the time window. In other 
words, given a length-M sequence, the points outside the sequence are 
related to points inside the sequences via 

Equation: 


ck[m] = c.[m + M] 


Recall that a linear convolution with an M-cyclic input is equivalent to a 
circular convolution with one M-sample period of the input sequences. 
Furthermore, the output of this circular convolution is itself M/-cyclic, 
implying our 2-downsampled branch outputs are cyclic with period M, Thus, 
given an M-length input sequence, the total filterbank output consists of 
exactly M values. 


It is instructive to write the circular-convolution analysis fiterbank operation 
in matrix form. In [link] we give an example for filter length N = 4, 
sequence length NV = 8, and causal synthesis filters H(z) and G(z). 
Equation: 


Cx+1(0] nfo) All] Al2] hi3] 0 o0 o0 0 \ /eg(0] 
rai] 0 0 AO Aft] hJ A] 0o 0 || eit] 
E 0 0 0 0 AO Alt) AA Al] | ela 
ck+1[3] | _ |2] RB] 0 0 0 0 [O] All] | | cl] 
dr+1[0] glo] [1] g2] g3] O 0 O O || cx[4! 
dr+ı[1] 0 © g0] gfi] g2] g3] 0 O || c[5] 
dr+1[2] 0 0 0 © g0] g1] g2] g[3] | | cxl6] 
dr+1[3] g2] 3] 0 0 0 © g0] gf1]/ \cxl7] 
where 
ck+1 [0] 
ck+1[1] 
Ck+1[2] 
Ck+1 Ck+1[3] 
a -| dey [O] 
di+1(1] 
dr+1[2] 
dx.+1[3] 


nfo] rll] hi] h3) 0 o o o 
0 0 AO Alt] hJ Als] 0 0 
0 0 0 O Alo} Alt) Al2] hi3 
eae h2) h3) 0 o0 0 O Alo] alt 
Gm) | gl0] gfi] 92] [3] 0 0 0 0 
0 0 g0] gi] g2] g3] 0O 0 
0 0 0 0O g0] oft] g[2] gf3] 
g2) g3] 0O 0 O © g0] gfi] 

cx|0| 

cr(1| 

cp[2] 

_ | ex(3] 

| end 

cr(5| 

cx|6| 

cxl7| 


The matrices H m and G m have interesting properties. For example, the 
conditions 


ae eu) h|n — 2m] 


gin] = -1°A[N — 1 — n] 


Hu\' (Hu\ (Hu Hy\" y 
Ga) Gu) Cu? Cy) F 
where Jņ denotes the MxM identity matrix. Thus, it makes sense to define 


the MxM DWT matrix as 
Equation: 


imply that 


Hy 
Ts 
M a 


whose transpose constitutes the MxM inverse DWT matrix: 
Equation: 


T = Ty 


Since the synthesis filterbank ([link]) 


ck+1 [n] i12 H(z) ~ 


Ch} cfm] 


dky [n] — t2 | Giz) 


gives perfect reconstruction, and since the cascade of matrix operations 
Tm“ Ty also corresponds to perfect reconstruction, we expect that the matrix 
operation Tyt describes the action of the synthesis filterbank. This is readily 
confirmed by writing the upsampled circular convolutions in matrix form: 
Equation: 


c,[0] hio) 0 0 Ald glo} 0 0 g2\ /cx+(0) 
cx[1] hil] 0 0 Af] git] 0 0 gf] ] J celt 
c[2] h2] ho o0 0 g2 g0 o0 0 || crl] 
c,[3] h[3] hj 0 0 g3] g1] 0 0 | | crl] 
cla] | o a2] ao 0 o g2] go) 0 |} dl 
cx[5] 0 rf] A] 0 0 gf] oft] 0 | | deaf] 
c,[6] 0 0 h2] Ao] o 0 g2] g0] || drl? 
c,[7] 0 0 h3] All] o 0 gf3] gf] (dinil 


where 


ho) 0 O Af2] glo} 0 0 gf? 
hi) O0 O R3) gf} 0O 0 gf} 
hi2] ho O 0 g2 go 0 o0 
(g) opr |B ha] 0 o g3) d] 0 o 
Ga “o | o aR] ao 0 o g2] go] 0 
o h] kh] 0 0 g3] gi] 0 
0 0 R2) Ao] 0 © gf2] g[0] 
0 0 RB] Aft] 0 © g3] ofl] 


So far we have concentrated on one stage in the wavelet decomposition; a 
two-stage decomposition is illustrated in [link]. 


cı [m] i H(z!) | +! {2| + cafn 
k H(z) | 42 |. G(z7) os 42 |= dafn] 
al —e Ge) 42] x dy{m| 


The two-stage analysis operation (assuming circular convolution) can be 
expressed in matrix form as 


Equation: 
Ck+2 Tu 0 E 
d = 
ai > £2 ee 


dk+1 


| 
aS 
© N 
wl 
~ © 
v| 
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Pe 


Similarly, a three-stage analysis could be implemented via 
Equation: 


Ck+3 Tu 0 0 a ó 
d M 
[=] 0 In 0 (Tu) (ck) 
dk+2 0 Iu 
0 0 Im ¢ 
dk+1 2 


It should now be evident how to extend this procedure to 3 stages. As noted 
earlier, the corresponding synthesis operations are accomplished by 
transposing the matrix products used in the analysis. 


DWT Implementation using FFTs 


Finally, we say a few words about DWT implementation. Here we focus on 
a single DWT stage and assume circular convolution, yielding an MxM 
DWT matrix Ty . In the general case, MxM matrix multiplication requires 
M? multiplications. The DWT matrices, however, have a circular- 
convolution structure which allows us to implement them using 
significantly less multiplies. Below we present some simple and reasonably 
efficient approaches for the implementation of Ty and T. 


We treat the inverse DWT first. Recall that in the lowpass synthesis branch, 
we upsample the input before circularly convolving with H(z). Denoting 
the upsampled coefficient sequence by afn], fast circular convolution 
a|n]*h|n] can be described as follows (using Matlab notation) 


ifft( fft(a).*fft(h,length(a)) ) 


where we have assumed that length(a) = Length(h). [footnote] 
The highpass branch is handled similarly using G(z), after which the two 
branch outputs are summed. 

When implementing the multi-level transform, you must ensure that the 
data length does not become shorter than the filter length! 


Next we treat the forward DWT. Recall that in the lowpass analysis branch, 
we circularly convolve the input with H (2-7) and then downsample the 
result. The fast circular convolution a[n|*h[—n] can be implemented using 


wshift('1', 
ifft(fft(a).*fft(flipud(h),length(a))), 
length(h)-1 ) 


where WShift accomplishes a circular shift of the Lf ft output that makes 
up for the unwanted delay of Length(h) -1 samples imposed by the 


flipud operation. The highpass branch is handled similarly but with filter 
G oa Finally, each branch is downsampled by factor two. 


We note that the proposed approach is not totally efficient because 
downsampling is performed after circular convolution (and upsampling 
before circular convolution). Still, we have outlined this approach because 
it is easy to understand and still results in major saving when M is large: it 
converts the O( M a) matrix multiply into an O(M log, M) operation. 


DWT Applications - Choice of phi(t) 


Transforms are signal processing tools that are used to give a clear view of 
essential signal characteristics. Fourier transforms are ideal for infinite- 
duration signals that contain a relatively small number of sinusoids: one can 
completely describe the signal using only a few coefficients. Fourier 
transforms, however, are not well-suited to signals of a non-sinusoidal 
nature (as discussed earlier in the context of time-frequency analysis). The 
multi-resolution DWT is a more general transform that is well-suited to a 
larger class of signals. For the DWT to give an efficient description of the 
signal, however, we must choose a wavelet ~ t from which the signal can 
be constructed (to a good approximation) using only a few stretched and 
shifted copies. 


We illustrate this concept in [link] using two examples. On the left, we 
analyze a step-like waveform, while on the right we analyze a chirp-like 
waveform. In both cases, we try DWTs based on the Haar and Daubechies 
db10 wavelets and plot the log magnitudes of the transform coefficients 
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Observe that the Haar DWT yields an extremely efficient representation of 
the step-waveform: only a few of the transform coefficients are nonzero. 


The db10 DWT does not give an efficient representation: many 
coefficients are sizable. This makes sense because the Haar scaling function 
is well matched to the step-like nature of the time-domain signal. In 
contrast, the Haar DWT does not give an efficient representation of the 
chirp-like waveform, while the dd10 DWT does better. This makes sense 
because the sharp edges of the Haar scaling function do not match the 
smooth chirp signal, while the smoothness of the db10 wavelet yields a 
better match. 


DWT Application - De-noising 


Say that the DWT for a particular choice of wavelet yields an efficient 
representation of a particular signal class. In other words, signals in the 
class are well-described using a few large transform coefficients. 


Now consider unstructured noise, which cannot be eifficiently represented 
by any transform, including the DWT. Due to the orthogonality of the 
DWT, such noise sequences make, on average, equal contributions to all 
transform coefficients. Any given noise sequence is expected to yield many 
small-valued transform coefficients. 


Together, these two ideas suggest a means of de-noising a signal. Say that 
we perform a DWT on a signal from our well-matched signal class that has 
been corrupted by additive noise. We expect that large transform 
coefficients are composed mostly of signal content, while small transform 
coefficients should be composed mostly of noise content. Hence, throwing 
away the transform coefficients whose magnitude is less than some small 
threshold should improve the signal-to-noise ratio. The de-noising 
procedure is illustrated in [link]. 


noisy signal —+{ pwr -threshold threshold | IDWT | m de-noised signal 


Now we give an example of denoising a step-like waveform using the Haar 
DWT. In [link], the top right subplot shows the noisy signal and the top left 
shows it DWT coefficients. Note the presence of a few large DWT 
coefficients, expected to contain mostly signal components, as well as the 
presence of many small-valued coefficients, expected to contain noise. (The 
bottom left subplot shows the DWT for the original signal before any noise 
was added, which confirms that all signal energy is contained within a few 
large coefficients.) If we throw away all DWT coefficients whose 
magnitude is less than 0.1, we are left with only the large coefficients 
(shown in the middle left plot) which correspond to the de-noised time- 
domain signal shown in the middle right plot. The difference between the 
de-noised signal and the original noiseless signal is shown in the bottom 
right. Non-zero error results from noise contributions to the large 
coefficients; there is no way of distinguishing these noise components from 
signal components. 
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Overview of Multirate Signal Processing 


Digital transformation of the sampling rate of signals, or signal processing 
with different sampling rates in the system. 


Applications 


1. Sampling-rate conversion CD to DAT format change, for example. 

2. Improved D/A, A/D conversion oversampling converters; which 
reduce performance requirements on anti-aliasing or reconstruction 
filters 

3. FDM channel modulation and processing bandwidth of individual 
channels is much less than the overall bandwidth 

4. Subband coding of speech and images Eyes and ears are not as 
sensitive to errors in higher frequency bands, so many coding schemes 
split signals into different frequency bands and quantize higher- 
frequency bands with much less precision. 


Outline of Multirate DSP material 


1. General Rate-changing System 

2. Integer-factor Interpolation and Decimation and Rational-factor Rate 
Changing 

. Efficient Multirate Filter Structures 

. Optimal Filter Design for Multirate Systems 

. Multi-stage Multirate Systems 

. Oversampling D/As 

. Perfect-Reconstruction Filter Banks and Quadrature Mirror Filters 


NO U1 BR W 


General Rate-Changing Procedure 


This procedure is motivated by an analog-based method: one conceptually 
simple method to change the sampling rate is to simply convert a digital 
signal to an analog signal and resample it! ([link]) 


Da l Sa 
X (n) — D/A Da H n, H~ X (m) 


Kaba 
AD) = i ! 


0 otherwise 


sin (41) 


haalt) = ` 
Tt 
Recall the ideal D/A: 
Equation: 
00 sin( mo ) 
/ 
Talt) = > xo(n) (t—nTp) 
n=— 00 T, 


The problems with this scheme are: 


1. A/D, D/A, filters cost money 
2. imperfections in these devices introduce errors 


Digital implementation of rate-changing according to this formula has three 
problems: 


1. Infinite sum: The solution is to truncate. Consider sinc (t) œ 0 for 
t < tı, t > to: Then mT, — nTo < tı and mT) — nTo > tə which 
implies 


N2 
fn) = xo(n) sincer: (mT, — nTo) 
n=N, 


Note: This is essentially lowpass filter design using a boxcar window: 
other finite-length filter design methods could be used for this. 


. Lack of causality: The solution is to delay by max {||} samples. 
The mathematics of the analog portions of this system can be 
implemented digitally. 


Equation: 
= * p! 
x1(m) = haa) zalhan 
5 n mTi—T-nTo : TT 
L- © © in s(a) A) s(a) Ae 
— —oo n=—oo 0 a mT \—T—nTg T 
To 1 
Equation: 
toe See 
1 n=- “0 qr (mT; —nTp) T’=max{Ty,T; } 
oo 


= 7 4 £0(n) siner (mT; — nTo) 
So we have an all-digital formula for exact digital-to-digital rate 
changing! 

. Cost of computing sincz (mT — nT): The solution is to 
precompute the table of sinc (t) values. However, if 7 is nota 
rational fraction, an infinite number of samples will be needed, so 
some approximation will have to be tolerated. 


Note: Rate transformation of any rate to any other rate can be 
accomplished digitally with arbitrary precision (if some delay is 
acceptable). This method is used in practice in many cases. We will 
examine a number of special cases and computational improvements, 
but in some sense everything that follows are details; the above idea is 
the central idea in multirate signal processing. 


Useful references for the traditional material (everything except PRFBs) are 
Crochiere and Rabiner, 1981 and Crochiere and Rabiner, 1983. A more 
recent tutorial is Vaidyanathan; see also Rioul and Vetterli. References to 
most of the original papers can be found in these tutorials. 


Interpolation, Decimation, and Rate Changing by Integer Fractions 


Interpolation: by an integer factor L 


Interpolation means increasing the sampling rate, or filling in in-between samples. Equivalent to 
sampling a bandlimited analog signal L times faster. For the ideal interpolator, 
Equation: 


Xo(Lw) if |u| < A 
0 if Z< jol <r 


Xi(w)={ 


We wish to accomplish this digitally. Consider [link] and [link]. 
Equation: 


_ [Xo(#) if m= {0,4(L), +(2L),...} 
mn) = $ NE 


denoted as 


x (n)—>| t L |—= ym) 


The DTFT of y(m) is 
Equation: 


Y(w) = Emu y(m)e m) 
= Vint To(n)e Cem 


EL o 2(m)en ven) 


= Xo (wL) 


Since Xo(w') is periodic with a period of 2r, Xo(Lw) = Y (w) is periodic with a period of 2 (see 
[link]). 


X (t) Y(w) 


bas e NO 


T In TİL 3r/L. 


By inserting zero samples between the samples of x9(n), we obtain a signal with a scaled frequency 
response that simply replicates Xo(w’) L times over a 27 interval! 


Obviously, the desired zı (m) can be obtained simply by lowpass filtering y(m) to remove the 


replicas. 
Equation: 
x1(m) = y(m)*hr(m) 
Given 
lif |w] < 
H L 
zl a 2 <|wl<a 


In practice, a finite-length lowpass filter is designed using any of the methods studied so far ({link]). 
Interpolator Block Diagram 


rae eee 


Decimation: sampling rate reduction (by an integer factor M) 


Let y(m) = zo(Lm) ([link]) 


x (n) „e y(m) 


That is, keep only every Lth sample ([link]) 


x ( n) 
y(m) 


In frequency (DTFT): 
Equation: 
Y¥(w) = Em- y(mje m 
= YZ to(Mm)e"™) 
= EL oron) EL o b(n — Mke HH) | nem 
= Yro Ton) DR oo O(n — Me) 


= DTFT [2o(n)|*DTFT [> 6(n — Mk)] 


1 
v= 


Now DTFT [X 6(n — Mk)] = 27 So X(k)6(w, — 22%) for |w| < m as shown in homework 
#1 , where X(k) is the DFT of one period of the periodic sequence. In this case, X(k) = 1 for 
M- T 
k € {0,1,..., M — 1} and DTFT [> ô(n — Mk)] = 2r Fio 6(w, — ZE). 
Equation: 
DTFT [zo(n)]*DTFT [> 5(n — Mk)] = Xow) * 2r Eo 5 (w, — ZE) 
= A Jt Kol!) (aE 6 (wr = wr — 2BE)) a 


= SG Kole FP) 


so Y(w) = Pi X0(- — 22#) i.e., we get digital aliasing .((link]) 


X (w) Y(w) 


mim 3n/m Srr/m m 
t 

T 
t 

m 


Usually, we prefer not to have aliasing, so the downsampler is preceded by a lowpass filter to 
remove all frequency components above |w| < +7 ([link)). 


Rate-Changing by a Rational Fraction L/M 


This is easily accomplished by interpolating by a factor of L, then decimating by a factor of M 
([link]). 


x (n) — İL — H, — H, H im Hx 


The two lowpass filters can be combined into one LP filter with the lower cutoff, 


EA Eae 
3 -f max{L,M} 


0 if mamy Á lel <7 


Obviously, the computational complexity and simplicity of implementation will depend on L, 2/3 
will be easier to implement than 1061/1060! 


Efficient Multirate Filter Structures 
Rate-changing appears expensive computationally, since for both 


decimation and interpolation the lowpass filter is implemented at the higher 
rate. However, this is not necessary. 


Interpolation 


For the interpolator, most of the samples in the upsampled signal are zero, 
and thus require no computation. ({link]) 


h(k) 


xX i( m)=y( m)+h, ,( m) 


y(m) Pe ee ee ee m=Ln ti. g (n) 
Meg ge a S m=Ln+!1 = Ane g (n) 
LL m=Ln+2 Hh, g,(n) 

$ 
L111 x @ 


Form = L | = | +m mod L and p = m mod L, 
Equation: 


zi(m) = EF y, Arp(m)y(m) 


No 


= a 9p(k)xo(| | T k) 


gp(n) = h(Ln + p) 


Pictorially, this can be represented as in [link]. 


y(m) Or 
SLi 
T rate L'T rate 
0 0 


These are called polyphase structures, and the gp(n) are called polyphase 
filters. 


Computational cost 
If h(m) is a length-N filter: 
N LN computations 


e No simplification: TT a 


_1_ \ computations 
iy sec 


e Polyphase structure: (14 > PA where L is the 


number of filters, A is the taps/filter, and T is the rate. 


Thus we save a factor of L by not being dumb. 


Note: For a given precision, N is proportional to L, (why?), so the 
computational cost does increase with the interpolation rate. 


Note: Can similar computational savings be obtained with IIR structures? 


Efficient Decimation Structures 


We only want every Mth output, so we compute only the outputs of 
interest. ([link]) 


No 


x1(m) = X` ao(Lm — k)h(k) 


k=N, 


Polyphase Decimation Structure 


x (n) —y lM +)» x (m) 
al 


o , * 
a x(n) 


The decimation structures are flow-graph reversals of the interpolation 
structure. Although direct implementation of the full filter for every Mth 
sample is obvious and straightforward, these polyphase structures give 
some idea as to how one might evenly partition the computation over M 
cycles. 


Efficient L/M rate changers 


Interpolate by L and decimate by M ([link]). 


x(n) tL |H eH oh) tm |e x om 


Combine the lowpass filters ({link]). 


i | 
oE Ha 


We can couple the lowpass filter either to the interpolator or the decimator 
to implement it efficiently ([link]). 


x(n) 


ee 
— 


Of course we only compute the polyphase filter output selected by the 
decimator. 
Computational Cost 


Every T; = ZT seconds, compute one polyphase filter of length 2, or 


+ + N multiplies 


Tı To E MTo second 


However, note that N is proportional to max {L, M}. 


Filter Design for Multirate Systems 


The filter design techniques learned earlier can be applied to the design of filters in multirate 
systems, with a few twists. 


Example: 

Design a factor-of-Z interpolator for use in a CD player, we might wish that the out-of-band error 
be below the least significant bit, or 96dB down, and < 0.05 % error in the passband, so these 
specifications could be used for optimal Lo filter design. 


In a CD player, the sampling rate is 44.1kHz, corresponding to a Nyquist frequency of 22.05kHz, 
but the sampled signal is bandlimited to 20kHz. This leaves a small transition band, from 20kHz to 
24.1kHz. However, note that in any case where the signal spectrum is zero over some band, this 
introduces other zero bands in the scaled, replicated spectrum ([link]). 

X (w) 


X (w) 


-niL |w rl 2il 4n/L 


So we need only control the filter response in the stopbands over the frequency regions with nonzero 
energy. ([link]) 
Y(w) 


+5 


SS SLLLSSSSSSSSSS | SLLLSLLLSSLLSSSD 
SPLELSPPPPPLLD. SLLLLTSSSSSLLD 


w/L (2rew IL (2m+w L (4rew IL (4m+w J/L 


The extra "don't care" bands allow a given set of specifications to be satisfied with a shorter-length 
filter. 


Direct polyphase filter design 


Note that in an integer-factor interpolator, each set of output samples zı (Ln + p), 

p = {0,1,..., L — 1}, is created by a different polyphase filter g,(n), which has no interaction 
with the other polyphase filters except in that they each interpolate the same signal. We can thus 
treat the design of each polyphase filter independently, as an X -length filter design problem. 
({link]) 


Each gp(n) produces samples zı (Ln + p) = £o (n + 2), where n + + is not an integer. That is, 
Gp() is to produce an output signal (at a To rate) that is x(n) time-advanced by a non-integer 


advance +. 


The desired response of this polyphase filter is thus 
Hpp(w) = er 
for |w| < ~, an all-pass filter with a linear, non-integer, phase. Each polyphase filter can be designed 


independently to approximate this response according to any of the design criteria developed so far. 
Exercise: 


Problem: What should the polyphase filter for p = 0 be? 


Solution: 


A delta function: ho(n) = 6(n’) 


Example: 
Least-squares Polyphase Filter Design 


e Deterministic x(n) Minimize 


Given x(n) = x(n)*h(n) and za(n) = z(n)*ha(n). Using Parseval's theorem, this becomes 
Equation: 


min {IZ (l2(n) — ea(n)|)?} 


min {£ J”, (|X(w)H(w) — X(w)Ha(w)|)? dw} 


min { £ f7, E(w) — Ha(w)|(|X(w)))? dw} 


This is simply weighted least squares design, with (|X(w)|)” as the weighting function. 
e stochastic X(@) 
Equation: 


min { E|(|e(n) — 2a(n)|)°| } 


B | (|e(n)*((n) — ha(n)) 1)? 
= min {4 f7, (Halo) — HW) Selo) dw} 


Szz(w) is the power spectral density of x. 


Sza(w) = DTFT [ree(*)| 
Pe (k) = Bla(k a )a(0)| 
Again, a weighted least squares filter design problem. 
Exercise: 
Problem: Is it feasible to use IIR polyphase filters? 


Solution: 


The recursive feedback of previous outputs means that portions of each IIR polyphase filter 
must be computed for every input sample; this usually makes IIR filters more expensive than 
FIR implementations. 


Multistage Multirate Systems 


Multistage multirate systems are often more efficient. Suppose one wishes 
to decimate a signal by an integer factor M, where M is a composite 
integer M = Mı M2. ..Mp = Ha Mii. A decimator can be implemented 
in a multistage fashion by first decimating by a factor Mı , then decimating 
this signal by a factor Mo , etc. ([link]) 

Multistage decimator 


Ts | MT, 
x (n) H —»|H n) heres (n) 


M, M.T, | MT, 


4 ‘ 
SEEE EHEH 


Multistage implementations are of significant practical interest only if they 
offer significant computational savings. In fact, they often do! 


The computational cost of a single-stage interpolator is: 


N taps 
MT sec 


The computational cost of a multistage interpolator is: 


pl prk 
MıTo  MıMə2T0 MT 


The first term is the most significant, since the rate is highest. Since 

N; x M; for a lowpass filter, it is not immediately clear that a multistage 
system should require less computation. However, the multistage structure 
relaxes the requirements on the filters, which reduces their length and 
makes the overall computation less. 


Filter design for Multi-stage Structures 


Ostensibly, the first-stage filter must be a lowpass filter with a cutoff at ae 
to prevent aliasing after the downsampler. However, note that aliasing 
outside the final overall passband |w| < +7 is of no concern, since it will 
be removed by later stages. We only need prevent aliasing into the band 

|w| < +7; thus we need only specify the passband over the interval 


|w| < 3, and the stopband over the intervals w € [3 ze Ank ae | 


M, M>’ M, 
for k € {1,..., M — 1}. ([link]) 


A SILLSLSSLSSSSS 
LLL EEE FELETT LLELL LEE EEE 


mim mm, 2rrim, 4rrim, 


Of course, we don't want gain in the transition bands, since this would need 
to be suppressed later, but otherwise we don't care about the response in 
those regions. Since the transition bands are so large, the required filter 
turns out to be quite short. The final stage has no "don't care" regions; 
however, it is operating at a low rate, so it is relatively unimportant if the 
final filter turns out to be rather long! 


L-infinity Tolerances on the Pass and Stopbands 


The overall response is a cascade of multiple filters, so the worst-case 
overall passband deviation, assuming all the peaks just happen to line up, is 


1+ öp = [1+8 
i=1 


So one could choose all filters to have equal specifications and require for 
each-stage filter. For ôp„ <1, 


10). = (/1 +b, ~1+pd,, 


1-65, < 1 — by,, ~1—p'dy,, 


Alternatively, one can design later stages (at lower computation rates) to 
compensate for the passband ripple in earlier stages to achieve 
exceptionally accurate passband response. 


ô, remains essentially unchanged, since the worst-case scenario is for the 
error to alias into the passband and undergo no further suppression in 
subsequent stages. 


Interpolation 


Interpolation is the flow-graph reversal of the multi-stage decimator. The 
first stage has a cutoff at + ((link]): 


1 
LILLINALAL Dae 
SISLLLLLLLLLL LD 


However, all subsequent stages have large bands without signal energy, due 
to the earlier stages ([link]): 


ha com LELALALD 
SLELISSLSSS LLLLSLSSSSS 


mL L, r 2nlL, \ 
2n/L+m/L L, 


2n/L; mL, L, 


The order of the filters is reversed, but otherwise the filters are identical to 
the decimator filters. 


Efficient Narrowband Lowpass Filtering 


A very narrow lowpass filter requires a very long FIR filter to achieve 
reasonable resolution in the frequency response. However, were the input 
sampled at a lower rate, the cutoff frequency would be correspondingly 
higher, and the filter could be shorter! 


The transition band is also broader, which helps as well. Thus, [link] can be 
implemented as [link]. 


x(n) — H, yn) 
=n, Hi Han H H H 


and in practice the inner lowpass filter can be coupled to the decimator or 
interpolator filters. If the decimator and interpolator are implemented as 
multistage structures, the overall algorithm can be dramatically more 
efficient than direct implementation! 


DFT-Based Filterbanks 


One common application of multirate processing arises in multirate, multi- 
channel filter banks ([link]). 


x(n) 


One application is separating frequency-division-multiplexed channels. If 
the filters are narrowband, the output channels can be decimated without 
significant aliasing. 


Such structures are especially attractive when they can be implemented 
efficiently. For example, if the filters are simply frequency modulated (by 

~(:%#4n) 
e 


FFTs! 


) versions of each other, they can be efficiently implemented using 


Furthermore, there are classes of filters called perfect reconstruction 
filters which are of finite length but from which the signal can be 
reconstructed exactly (using all M channels), even though the output of 
each channel experiences aliasing in the decimation step. These types of 
filterbanks have received a lot of research attention, culminating in wavelet 
theory and techniques. 


Uniform DFT Filter Banks 


Suppose we wish to split a digital input signal into NV frequency bands, 
uniformly spaced at center frequencies wg = 2th forO<k< N—-1. 
1 if jul < + 
. Bandpass 
0 otherwise 


filters can be constructed which have the frequency response 


Consider also a lowpass filter h(n), H(w) ~ l 


from 


The output of the kth bandpass filter is simply (assume h(n) are FIR) 
Equation: 


j 2nkm ) 


a(n)*hy(n) = YX a(n —m)h(m)e— 07 
yk(n) 


This looks suspiciously like a DFT, except that M Æ N, in general. 
However, if we fix M = N, then we can compute all y(n) outputs 
simultaneously using an FFT of x(n — m)h(m): The 

kth FFT frequency output = yx(n)! So the cost of computing all of 
these filter banks outputs is O[N log N], rather than N 2 per a given n. 
This is very useful for efficient implementation of transmultiplexors 
(FDM to TDM). 

Exercise: 


Problem: 
How would we implement this efficiently if we wanted to decimate the 


individual channels y;,(n) by a factor of N, to their approximate 
Nyquist bandwidth? 


Solution: 


Simply step by N time samples between FFTs. 
Exercise: 
Problem: 


Do you expect significant aliasing? If so, how do you propose to 
combat it? Efficiently? 


Solution: 
Aliasing should be expected. There are two ways to reduce it: 


1. Decimate by less ("oversample" the individual channels) such as 
decimating by a factor of x. This is efficiently done by time- 
stepping by the appropriate factor. 

2. Design better (and thus longer) filters, say of length LN. These 
can be efficiently computed by producing only N (every Lth) 
FFT outputs using simplified FFTs. 


Exercise: 


Problem: 


How might one convert from N input channels into an FDM signal 
efficiently? ([link]) 


Note: Such systems are used throughout the telephone system, 
satellite communication links, etc. 


Solution: 


Use an FFT and an inverse FFT for the modulation (TDM to FDM) 
and demodulation (FDM to TDM), respectively. 


Quadrature Mirror Filterbanks (QMF) 


Although the DFT filterbanks are widely used, there is a problem with aliasing in the decimated channels. 
At first glance, one might think that this is an insurmountable problem and must simply be accepted. 
Clearly, with FIR filters and maximal decimation, aliasing will occur. However, a simple example will show 
that it is possible to exactly cancel out aliasing under certain conditions!!! 


Consider the following trivial filterbank system, with two channels. ({link]) 


Note @(n) = x(n) with no error whatsoever, although clearly aliasing occurs in both channels! Note that 
the overall data rate is still the Nyquist rate, so there are clearly enough degrees of freedom available to 
reconstruct the data, if the filterbank is designed carefully. However, this isn't splitting the data into separate 
frequency bands, so one questions whether something other than this trivial example could work. 


Let's consider a general two-channel filterbank, and try to determine conditions under which aliasing can be 
cancelled, and the signal can be reconstructed perfectly ([link]). 


Let's derive Z(n), using z-transforms, in terms of the components of this system. Recall ([link]) is 
equivalent to 


x(n)-»[H@)}» y(n) 


and note that ([link]) is equivalent to 


Y(z)= > a(m)z~ 4") = n(2") 
Y(w) = X(Lw) 
x(n)-»[ + L}» y(m)=x(MWL).0.... 
and ([link]) is equivalent to 
1/3 
Y(z)= — X` X(zuwh 
a 2, (FW) 


x(n) [mj y(m)=x(Mm) 


Y (z) is derived in the downsampler as follows: 


Y(z)= 5 r(Mm)z™ 
Let n = Mm and m = 7,7, then 
¥(z) = 5 x(n) 5 d(n — Mp)z ™ 
n=—00 p=- 
Now 
Equation: 
© „ôln- Mp) = IDFT 7S 
a(n) > p-o ôn — Mp) x(w)* iT io 5(w 
T M-1 T 
= wer | i Xw- A 
M-1 —n 
E es X(n)W m m T 
M= 
so 
Equation: 


¥ig) = Po (+ Ko a(n) Wig) 2 Ezi 


= $7 DIG! ehwi) ” 
= a ea x(zrw i) 


Armed with these results, let's determine X(z) <= 2(n). ([link]) 


Note 


)| 


i2m 
M 


2rk 


M 


| 


and 


La(2) = 5 Fi(2)Hi(2)X(2) + 5 Fi(2)Ea(—2)X(—2) = 5 Fle) Hal) X(2) + 5 Fi (2)Ba(—2) X(-2) 


Finally then, 
Equation: 


F(z) = Ua(z) + La(z) 
= 4 (Ho(z)Fo(z)X(z) + Ho(—2)Fo(2)X(—z) + Hı (2)Fi(z)X(z) + Ai(-z) Fi(z)X(—2)) 
= z (Ho(2)Fo(z) + Hi(2)Fi(2))X(z) + 3 (Ho(—2)Fo(z) + Hi(—2)Fi(z))X(—z) 


Note that the X(—z) —> X(w + 7) corresponds to the aliasing terms! 
There are four things we would like to have: 


1. No aliasing distortion 

2. No phase distortion (overall linear phase > simple time delay) 
3. No amplitude distortion 

4. FIR filters 


No aliasing distortion 


By insisting that Hy)(—z) F(z) + H(—z)F,(z) = 0, the X(—z) component of X(z) can be removed, and 
all aliasing will be eliminated! There may be many choices for Ho, H1, Fo, F; that eliminate aliasing, but 
most research has focused on the choice 


Fo(z) = Hi(—z) : Fi(z) = —Ho(—z) 
We will consider only this choice in the following discussion. 


Phase distortion 

The transfer function of the filter bank, with aliasing cancelled, becomes 

T(z) = 4 (Ao(z)Fo(z) + Ai(z)Fi(2)), which with the above choice becomes 

T(z) = 4 (Ho(z)Hi(—z) — Hi(z)Ho(—z)). We would like T(z) to correspond to a linear-phase filter to 
eliminate phase distortion: Call 


P(z) = H(z) Hi (—z) 
Note that 


1 
T(z) = 5 (P) — P(-2)) 
Note that P(—z) <= (—1)"p(n), and that if p(n) is a linear-phase filter, (—1)"p(n) is also (perhaps of the 
opposite symmetry). Also note that the sum of two linear-phase filters of the same symmetry (i.e., length of 
p(n) must be odd) is also linear phase, so if p(n) is an odd-length linear-phase filter, there will be no phase 
distortion. Also note that 


2p(n) if nis odd 
0 if nis even 


FOO =D) =) (a= { 


means p(n) = 0, when n is even. If we choose ho(n) and hi (7) to be linear phase, p(n) will also be linear 
phase. Thus by choosing ho(7) and h(n) to be FIR linear phase, we eliminate phase distortion and get FIR 
filters as well (condition 4). 


Amplitude distortion 
Assuming aliasing cancellation and elimination of phase distortion, we might also desire no amplitude 
distortion ( |T (w)| = 1). All of these conditions require 


T(z) = ; mona -nomne 


where c is some constant and D is a linear phase delay. c = 1 for |T (w)| = 1. It can be shown by 
considering that the following can be satisfied! 


2p(z) = 2cd(n — D) if nis odd 


E = P = P = = 2 =B 
(2) (z) (—z) Ge Oe Es = anything if nis even 


Thus we require 
N' 
P(z) = S penje n +z” 
n=0 


Any factorization of a P(z) of this form, P(z) = A(z)B(z) can lead to a Perfect Reconstruction filter bank 
of the form 


H(z) = A(z) 
H. 1 (—z) = B(z) 
[This result is attributed to Vetterli.] A well-known special case (Smith and Barnwell) 


H(z) = — (a eP m (—z7")) 


Design techniques exist for optimally choosing the coefficients of these filters, under all of these constraints. 
Equation: 
Quadrature Mirror Filters 


H(z) = Ho(—z) & Hı (w) = Ho(r + w) = Hy (a — w) 
for real-valued filters. The frequency response is "mirrored" around w = +. This choice leads to 
T(z) = Ho? (z2) — Ho?(—z): it can be shown that this can be a perfect reconstruction system only if 
Ho(z) = coz) + cyz7@™) 


which isn't a very flexible choice of filters, and not a very good lowpass! The Smith and Barnwell approach 
is more commonly used today. 


M-Channel Filter Banks 


The theory of M-band QMFBs and PRFBs has been investigated recently. 
Some results are available. 


Tree-structured filter banks 


Once we have a two-band PRFB, we can continue to split the subbands 
with similar systems! ([link]) 


x(n) 


Thus we can recursively decompose a signal into 2 bands, each sampled at 
2 th the rate of the original signal, and reconstruct exactly! Due to the tree 
structure, this can be quite efficient, and in fact close to the efficiency of an 
FFT filter bank, which does not have perfect reconstruction. 


Wavelet decomposition 


We need not split both the upper-frequency and lower-frequency bands 
identically. ([link]) 


high 
frequencies 


x(n) 


low low-high frequencies 


frequencies 


low-low-low 
frequencies 


This is good for image coding, because the energy tends to be distributed 
such that after a wavelet decomposition, each band has roughly equal 
energy. 


Filter Structures 


A realizable filter must require only a finite number of computations per 
output sample. For linear, causal, time-Invariant filters, this restricts one to 
rational transfer functions of the form 


B bok ba Eare bna 
eae! ggg? +... + anz” 


H(z) 
Assuming no pole-zero cancellations, H(z) is FIR if Vi, i > 0: (a; = 0), 
and IIR otherwise. Filter structures usually implement rational transfer 
functions as difference equations. 


Whether FIR or IIR, a given transfer function can be implemented with 
many different filter structures. With infinite-precision data, coefficients, 
and arithmetic, all filter structures implementing the same transfer function 
produce the same output. However, different filter strucures may produce 
very different errors with quantized data and finite-precision or fixed-point 
arithmetic. The computational expense and memory usage may also differ 
greatly. Knowledge of different filter structures allows DSP engineers to 
trade off these factors to create the best implementation. 


FIR Filter Structures 


Consider causal FIR filters: y(n) = a h(k)a(n — k); this can be realized using 
the following structure 


or in a different notation 
z! z! z! z! 


x(n)—-= <n > 
... 
h(0) - 
iia as cn rs a n) 


-1 
Z 
— zt H = —==— = Register; unit delay 


> h(0) 

—> SS = Multiply 
l 

DSa = Pi = Addition 


This is called the direct-form FIR filter structure. 


There are no closed loops (no feedback) in this structure, so it is called a non- 
recursive structure. Since any FIR filter can be implemented using the direct-form, 
non-recursive structure, it is always possible to implement an FIR filter non- 
recursively. However, it is also possible to implement an FIR filter recursively, and 
for some special sets of FIR filter coefficients this is much more efficient. 


Example: 


where 


But note that 
y(n) = y(n — 1) + a(n) — s(n — M) 
This can be implemented as 


x(n) 


Instead of costing M — 1 adds/output point, this comb filter costs only two 
adds/output. 


Exercise: 
Problem: Is this stable, and if not, how can it be made so? 


IIR filters must be implemented with a recursive structure, since that's the only way a 
finite number of elements can generate an infinite-length impulse response in a linear, 
time-invariant (LTI) system. Recursive structures have the advantages of being able 
to implement IIR systems, and sometimes greater computational efficiency, but the 
disadvantages of possible instability, limit cycles, and other deletorious effects that 
we will study shortly. 


Transpose-form FIR filter structures 


The flow-graph-reversal theorem says that if one changes the directions of all the 
arrows, and inputs at the output and takes the output from the input of a reversed 
flow-graph, the new system has an identical input-output relationship to the original 


flow-graph. 
Direct-form FIR structure 
4 4 4 4 
Z Z Z Z 
x(n)—= 


i. as 
h(Q) ios h(2) i ~] h(M-1) 


a == == y(n) 


reverse = transpose-form FIR filter structure 


z! z! 
ya) —= o eal 
ho) Aba) Åbo) U 
oe ond 


<5 < 


or redrawn 


Cascade structures 
The z-transform of an FIR filter can be factored into a cascade of short-length filters 
bo + bizt + boz? +... + bmz ™ = bo (1 — nie) (1 — za): ahh (1 — tae} 


where the z; are the zeros of this polynomial. Since the coefficients of the polynomial 
are usually real, the roots are usually complex-conjugate pairs, so we generally 
combine (1 — oe) (1 — T into one quadratic (length-2) section with real 
coefficients 


(1- az) (1-— zz) =1-2 (x%)z 1+ (lzi)? = Hi(z) 


The overall filter can then be implemented in a cascade structure. 


This is occasionally done in FIR filter implementation when one or more of the short- 
length filters can be implemented efficiently. 


Lattice Structure 


It is also possible to implement FIR filters in a lattice structure: this is sometimes 
used in adaptive filtering 


x(n) 


IIR Filter Structures 


IIR (Infinite Impulse Response) filter structures must be recursive (use feedback); an infinite number of 
coefficients could not otherwise be realized with a finite number of computations per sample. 


N(z) = bo + by 271 boz~? T byz ™ 


D(z) 1tayz-1 + agz-2 +... +ayz-% 


H(z) = 


The corresponding time-domain difference equation is 


y(n) = (— (aiy(n — 1))) — azy(n — 2) +... — any(n — N) + boz (0) + biz(n — 1) +... + bmz(n — M) 


Direct-form I IIR Filter Structure 


The difference equation above is implemented directly as written by the Direct-Form I IIR Filter Structure. 


=s. =) J 


NG) se 
DZ) 


Note that this is a cascade of two systems, N (z) and mE If we reverse the order of the filters, the overall 


system is unchanged: The memory elements appear in the middle and store identical values, so they can be 
combined, to form the Direct-Form II IIR Filter Structure. 


Direct-Form II IIR Filter Structure 


This structure is canonic: (i.e., it requires the minimum number of memory elements). 


Flowgraph reversal gives the 


Transpose-Form IIR Filter Structure 


y(n) 


Usually we design IIR filters with N = M, but not always. 


Obviously, since all these structures have identical frequency response, filter structures are not unique. We 
consider many different structures because 


1. Depending on the technology or application, one might be more convenient than another 
2. The response in a practical realization, in which the data and coefficients must be quantized, may differ 
substantially, and some structures behave much better than others with quantization. 


The Cascade-Form IIR filter structure is one of the least sensitive to quantization, which is why it is the most 
commonly used IIR filter structure. 


IIR Cascade Form 
The numerator and denominator polynomials can be factored 


bo + byz +... +by2z-™ _ bo Tie 2 — zk 
Ltajztt...tay2-N% zM-NTI™  z—pp 


H(z) = 


and implemented as a cascade of short IIR filters. 


Since the filter coefficients are usually real yet the roots are mostly complex, we actually implement these as 
second-order sections, where comple-conjugate pole and zero pairs are combined into second-order sections 


with real coefficients. The second-order sections are usually implemented with either the Direct-Form II or 
Transpose-Form structure. 


Parallel form 


A rational transfer function can also be written as 


bo +biız t +... +byz ™ ij ES i i , , 
= C0 t C1 Z Teee Teee 


Ltajyzt+...tayz-% z—pl z-p Z— PN 


which by linearity can be implemented as 


x(n) 


As before, we combine complex-conjugate pole pairs into second-order sections with real coefficients. 


The cascade and parallel forms are of interest because they are much less sensitive to coefficient quantization 
than higher-order structures, as analyzed in later modules in this course. 


Other forms 


There are many other structures for IR filters, such as wave digital filter structures, lattice-ladder, all-pass- 
based forms, and so forth. These are the result of extensive research to find structures which are 
computationally efficient and insensitive to quantization error. They all represent various tradeoffs; the best 
choice in a given context is not yet fully understood, and may never be. 


State- Variable Representation of Discrete-Time Systems 


State and the State-Variable Representation 


State 


the minimum additional information at time n, which, along with all current and future input values, is 
necessary to compute all future outputs. 


Essentially, the state of a system is the information held in the delay registers in a filter structure or signal 
flow graph. 


Note: Any LTI (linear, time-invariant) system of finite order M can be represented by a state-variable 
description 


x(n +1) = Az(n) + Bu(n) 
y(n) = Cæ(n) + Du(n) 


where æ is an M x 1 "state vector," u(n) is the input at time n, y(n) is the output at time n; A is an M x M 
matrix, Bis an M x 1 vector, Cisa1x M vector, and Dis a1 x 1 scalar. 


One can always obtain a state-variable description of a signal flow graph. 


Example: 
3rd-Order IIR 


y(n) = (— (ary(n — 1))) — agy(n — 2) — azy(n — 3) + box(n) + byx(n — 1) + box(n — 2) + b3x(n — 3) 


u(n) 


y(n) = ( — (azbo) —(a2bo) —(aibo)) x(n) + (bo)u(n) 


Exercise: 


Problem: Is the state-variable description of a filter H(z) unique? 


Exercise: 


Problem: Does the state-variable description fully describe the signal flow graph? 


State- Variable Transformation 


Suppose we wish to define a new set of state variables, related to the old set by a linear transformation: 
q(n) = Ta(n), where T is a nonsingular M x M matrix, and q(7) is the new state vector. We wish the 
overall system to remain the same. Note that æ(n) = T’~'g(n), and thus 


a(n +1) = Aw(n) + Bu(n) > T'gq(n) = AT ‘q(n) + Bu(n) > q(n) = TAT 'q(n) + TBu(n) 
(nt) = Ca(n) + Du(n) + y(n) =CT“4(n) + Du(n) 


This defines a new state system with an input-output behavior identical to the old system, but with different 
internal memory contents (states) and state matrices. 


q(n) = Aq(n) + Bu(n) 
y(n) = Cq(n) + Du(n) 
A=TAT-.B=TB.C=CT"D=D 


These transformations can be used to generate a wide variety of alternative stuctures or implementations of a 
filter. 


Transfer Function and the State-Variable Description 
Taking the z transform of the state equations 
Z|a(n + 1)] = Z| Aw(n) + Bu(n)| 
Z\y(n)] = Z[Cæ(n) + Du(n)] 
2X(z) = AX(z) + BU(z) 


Note: X(z) is a vector of scalar z-transforms X(z)’ = (Xi(z) X2(z) ...) 


Y(z) = CX(n) + DU(n) 


(zI — A)X(z) = BU(z) > X(z) = (zI — A) BU(z) 


Equation: 
Y(z) = O(zI — A) BU(z) + DU(z) 
= (c(- (21)) 1B + D) U(2z) 
and thus 


H(z) = C(zI — A)'B+D 


FE = T. 
Note that since (zI — A) * = H a 
The denominator polynomial is D(z) = det (zI — A). A discrete-time state system is thus stable if the M 
roots of det (zI — A) (i.e., the poles of the digital filter) are all inside the unit circle. 


, this transfer function is an Mth-order rational fraction in Z. 


Consider the transformed state system with A= PATH, B= TB, C=CT =i D= D: 
Equation: 


H(z) = C(al > A) "B+D 
= CT-\(2I TAT“) TB+D 
= OT-\(T (zl — A)T)"TB+D 
= OTT) (zI — A) ’T-TB +D 
= C(zI- A)'B+D 


This proves that state-variable transformation doesn't change the transfer function of the underlying system. 
However, it can provide alternate forms that are less sensitive to coefficient quantization or easier to analyze, 
understand, or implement. 


State-variable descriptions of systems are useful because they provide a fairly general tool for analyzing all 
systems; they provide a more detailed description of a signal flow graph than does the transfer function 
(although not a full description); and they suggest a large class of alternative implementations. They are even 
more useful in control theory, which is largely based on state descriptions of systems. 


Fixed-Point Number Representation 


Fixed-point arithmetic is generally used when hardware cost, speed, or 
complexity is important. Finite-precision quantization issues usually arise in 
fixed-point systems, so we concentrate on fixed-point quantization and 
error analysis in the remainder of this course. For basic signal processing 
computations such as digital filters and FFTs, the magnitude of the data, the 
internal states, and the output can usually be scaled to obtain good 
performance with a fixed-point implementation. 


Two's-Complement Integer Representation 


As far as the hardware is concerned, fixed-point number systems represent 
data as B-bit integers. The two's-complement number system is usually 
used: 


E hee integer representation if 0 < k < 25-1 — 1 
- |bit-by-bit inverse(—k) +1 if —221<k<0 


The most significant bit is known at the sign bit; it is 0 when the number is 
non-negative; 1 when the number is negative. 


Fractional Fixed-Point Number Representation 


For the purposes of signal processing, we often regard the fixed-point 
numbers as binary fractions between [—1, 1), by implicitly placing a 
decimal point after the sign bit. 


or 


This interpretation makes it clearer how to implement digital filters in 
fixed-point, at least when the coefficients have a magnitude less than 1. 


Truncation Error 


Consider the multiplication of two binary fractions 


Fractional Integer 
Interpretation Interpretation 


0.10 1/2 9) 
x 0.11 x 3/4 XxX 3 
. 010 


.010 
0.00 


0.0110 3/8 6 


Note that full-precision multiplication almost doubles the number of bits; if 
we wish to return the product to a B-bit representation, we must truncate 
the B — 1 least significant bits. However, this introduces truncation error 
(also known as quantization error, or roundoff error if the number is 
rounded to the nearest B-bit fractional value rather than truncated). Note 
that this occurs after multiplication. 


Overflow Error 


Consider the addition of two binary fractions; 


Fractional Integer 
Interpretation Interpretation 


0.10 1/2 2 
+ 0.11 + 3/4 +3 


1.01 5/4 = -1/4 p= =] 


Note the occurence of wraparound overflow; this only happens with 
addition. Obviously, it can be a bad problem. 


There are thus two types of fixed-point error: roundoff error, associated 
with data quantization and multiplication, and overflow error, associated 
with data quantization and additions. In fixed-point systems, one must 
strike a balance between these two error sources; by scaling down the data, 
the occurence of overflow errors is reduced, but the relative size of the 
roundoff error is increased. 


Note: Since multiplies require a number of additions, they are especially 
expensive in terms of hardware (with a complexity proportional to B, By, 
where B, is the number of bits in the data, and B, is the number of bits in 
the filter coefficients). Designers try to minimize both B, and Bp, and 
often choose Bz # Ba! 


Fixed-Point Quantization 


The fractional B-bit two's complement number representation evenly 
distributes 2” quantization levels between —1 and 1 — 2-(8-)) The 
spacing between quantization levels is then 


2 
a = 9—(B-1) es Ap 


Any signal value falling between two levels is assigned to one of the two 
levels. 


Xg = Q|z| is our notation for quantization. e = Q[æ] — < is then the 
quantization error. 


One method of quantization is rounding, which assigns the signal value to 
the nearest level. The maximum error is thus — Ta 
QE] 


Another common scheme, which is often easier to implement in hardware, 
is truncation. Q[z] assigns x to the next lowest level. 


i l 
i i i 
b g— r] a } Ą"— |4 


QE] 


The worst-case error with truncation is A = a aA which is twice as 
large as with rounding. Also, the error is always negative, so on average it 
may have a non-zero mean (i.e., a bias component). 


Overflow is the other problem. There are two common types: two's 
complement (or wraparound) overflow, or saturation overflow. 
wraparound 


QE] 


saturation 
Qk] 


Obviously, overflow errors are bad because they are typically large; two's 
complement (or wraparound) overflow introduces more error than 


saturation, but is easier to implement in hardware. It also has the advantage 
that if the sum of several numbers is between [—1, 1), the final answer will 
be correct even if intermediate sums overflow! However, wraparound 
overflow leaves IIR systems susceptible to zero-input large-scale limit 
cycles, as discussed in another module. As usual, there are many tradeoffs 
to evaluate, and no one right answer for all applications. 


Finite-Precision Error Analysis 


Fundamental Assumptions in finite-precision error analysis 


Quantization is a highly nonlinear process and is very difficult to analyze 
precisely. Approximations and assumptions are made to make analysis 
tractable. 


Assumption #1 


The roundoff or truncation errors at any point in a system at each time are 
random, stationary, and statistically independent (white and independent 
of all other quantizers in a system). 


That is, the error autocorrelation function is re[k] = Eļenen+x| = 76k]. 
Intuitively, and confirmed experimentally in some (but not all!) cases, one 
expects the quantization error to have a uniform distribution over the 


interval [—4, 4) for rounding, or (—A, 0] for truncation. 
2°92 g ’ 


In this case, rounding has zero mean and variance 


E[Q[zn] — en] = 0 


Apr? 
2 2 B 
Bo mE S 
and truncation has the statistics 
A 
E|Q|zn] z Tal _ 2) 
9 Ap’ 
Q = I? 


Please note that the independence assumption may be very bad (for 
example, when quantizing a sinusoid with an integer period N). There is 


another quantizing scheme called dithering, in which the values are 
randomly assigned to nearby quantization levels. This can be (and often is) 
implemented by adding a small (one- or two-bit) random input to the signal 
before a truncation or rounding quantizer. 


Dither signal 


Sy 


This is used extensively in practice. Altough the overall error is somewhat 
higher, it is spread evenly over all frequencies, rather than being 
concentrated in spectral lines. This is very important when quantizing 
sinusoidal or other periodic signals, for example. 


Assumption #2 


Pretend that the quantization error is really additive Gaussian noise with 
the same mean and variance as the uniform quantizer. That is, model 


as 


This model is a linear system, which our standard theory can handle easily. 
We model the noise as Gaussian because it remains Gaussian after passing 
through filters, so analysis in a system context is tractable. 


Summary of Useful Statistical Facts 


e correlation function r,[k] = E[£n£n+k] 

e power spectral density S,(w) = DTFT [r,[n|| 
© Note rz(0] = oz? = = f7, S2(w) dw 

© Txy[k] = Elz "[n]y[n + ki] 

e cross-spectral density S,,(w) = DTFT [r,,y|n]| 
e Fory =h*z: 


Syy(w) = (|H(w)|)"S2(w) 


e Note that the output noise level after filtering a noise sequence is 


ay? = ry == S (EOS dw 


T Jr 
so postfiltering quantization noise alters the noise power spectrum and 
may change its variance! 
e For £1, £ə statistically independent 
Tri+z [k] = Pr [k] T Tx, [k] 
Srita? (w) = Sx, (w) T Se (w) 


e For independent random variables 


2 2 2 
Orie = Ox FOr 


Input Quantization Noise Analysis 


All practical analog-to-digital converters (A/D) must quantize the input 
data. This can be modeled as an ideal sampler followed by a B-bit 
quantizer. 


o— fap 


The signal-to-noise ratio (SNR) of an A/D is 
Equation: 


SNR 


10 log = 


A 2 
= 10log P, — 10log 455 
= 10log Ps + 4.77 + 6.02B 


where P, is the power in the signal and P, is the power of the quantization 
noise, which equals its variance if it has a zero mean. The SNR increases by 
6dB with each additional bit. 


Quantization Error in FIR Filters 


In digital filters, both the data at various places in the filter, which are 
continually varying, and the coefficients, which are fixed, must be 
quantized. The effects of quantization on data and coefficients are quite 
different, so they are analyzed separately. 


Data Quantization 


Typically, the input and output in a digital filter are quantized by the analog- 
to-digital and digital-to-analog converters, respectively. Quantization also 
occurs at various points in a filter structure, usually after a multiply, since 
multiplies increase the number of bits. 


Direct-form Structures 


There are two common possibilities for quantization in a direct-form FIR 
filter structure: after each multiply, or only once at the end. 


e . . . 2 
Single-precision accumulate; total variance M * 


. . . 2 
Double-precision accumulate; variance = 


In the latter structure, a double-length accumulator adds all 2B — 1 bits of 
each product into the accumulating sum, and truncates only at the end. 
Obviously, this is much preferred, and should always be used wherever 
possible. All DSP microprocessors and most general-pupose computers 
support double-precision accumulation. 


Transpose-form 


Similarly, the transpose-form FIR filter structure presents two common 
options for quantization: after each multiply, or once at the end. 


Quantize at each stage before storing intermediate sum. Output 
x 2 
variance M — 


or 


y(n) 


+L FO] o 


Store double-precision partial sums. Costs more memory, but variance 
A2 
12 


The transpose form is not as convenient in terms of supporting double- 
precision accumulation, which is a significant disadvantage of this 
structure. 


Coefficient Quantization 


Since a quantized coefficient is fixed for all time, we treat it differently than 
data quantization. The fundamental question is: how much does the 
quantization affect the frequency response of the filter? 


The quantized filter frequency response is 
DTFT [ho] = DTFT Pine. prec. ae e] = Hine. prec. (w) an H.(w) 


Assuming the quantization model is correct, H,(w) should be fairly 
random and white, with the error spread fairly equally over all frequencies 
w E |—7, 7); however, the randomness of this error destroys any equiripple 
property or any infinite-precision optimality of a filter. 

Exercise: 


Problem: 


What quantization scheme minimizes the Lə quantization error in 
ot T = 2 5 

frequency (minimizes f” (|H(w) — Hg(w)|)” d w)? On average, 

how big is this error? 


Ideally, if one knows the coefficients are to be quantized to B bits, one 
should incorporate this directly into the filter design problem, and find the 
M B-bit binary fractional coefficients minimizing the maximum deviation ( 
Lg, error). This can be done, but it is an integer program, which is known 
to be np-hard (i.e., requires almost a brute-force search). This is so 
expensive computationally that it's rarely done. There are some sub-optimal 
methods that are much more efficient and usually produce pretty good 
results. 


Data Quantization in IIR Filters 


Finite-precision effects are much more of a concern with IIR filters than 
with FIR filters, since the effects are more difficult to analyze and 
minimize, coefficient quantization errors can cause the filters to become 
unstable, and disastrous things like large-scale limit cycles can occur. 


Roundoff noise analysis in IIR filters 


Suppose there are several quantization points in an IIR filter structure. By 
our simplifying assumptions about quantization error and Parseval's 
theorem, the quantization noise variance cy ;? at the output of the filter 
from the zth quantizer is 

Equation: 


Oy = ae J, (ilw) SSW) dw 
= 3 fr (E(w)? d 
= Om Don eee lt P) 


where gn,” is the variance of the quantization error at the ith quantizer, 
S'S',,(w) is the power spectral density of that quantization error, and 
HH,;(w) is the transfer function from the ith quantizer to the output point. 
Thus for P independent quantizers in the structure, the total quantization 
noise variance is 


n= ont fiw)? aw 


Note that in general, each H;(w), and thus the variance at the output due to 
each quantizer, is different; for example, the system as seen by a quantizer 
at the input to the first delay state in the Direct-Form II IIR filter structure 
to the output, call it n4, is 


with a transfer function 


—2 


z 
1+ aiz! + agz~? 


H4(z) 


which can be evaluated at z = eĉ” to obtain the frequency response. 


A general approach to find H;(w) is to write state equations for the 
equivalent structure as seen by n;, and to determine the transfer function 
according to H(z) = C(zI — A) B + d. 


Exercise: 


Problem: 


The above figure illustrates the quantization points in a typical 
implementation of a Direct-Form II IIR second-order section. What is 
the total variance of the output error due to all of the quantizers in the 
system? 


By making the assumption that each Q; represents a noise source that is 
white, independent of the other sources, and additive, 


the variance at the output is the sum of the variances at the output due to 
each noise source: 


4 
2 2 
Oy = 5 Oy 
i=l 


The variance due to each noise source at the output can be determined from 
+ f7 (H:(w) |)’ Sn, (w) d w; note that Sn, (w) = on,? by our 
assumptions, and H; (w) is the transfer function from the noise source to 
the output. 


IIR Coefficient Quantization Analysis 


Coefficient quantization is an important concern with IIR filters, since 
straigthforward quantization often yields poor results, and because 
quantization can produce unstable filters. 


Sensitivity analysis 


The performance and stability of an IIR filter depends on the pole locations, 
so it is important to know how quantization of the filter coefficients a; 
affects the pole locations p;. The denominator polynomial is 


N N 
D(z) =1+ y az" = I] 1— pz 
k=l i=1 


We wish to know oe : , which, for small deviations, will tell us that a 6 


change in a, yields ane = 6 —— change in the pole location. —— is the 


sensitivity of the pole location to quantization of az. We can find op : using 


the chain rule. 


0 A(z) _ OA(z) Oz 
Oa, ~~ Oz. Oa, 
4 
OD; — Oan Z=Pi 
Oa, Alz) 
ðz 2=Pi 


which is 
Equation: 


Op; = ar 
ô a = N = Z=Dj 
ak -(z LTT a1 1-p;z 1) Pi 
— —piN* 
HWY...” 
Tj =545,1 PsP 


Note that as the poles get closer together, the sensitivity increases greatly. 
So as the filter order increases and more poles get stuffed closer together 
inside the unit circle, the error introduced by coefficient quantization in the 
pole locations grows rapidly. 


How can we reduce this high sensitivity to IIR filter coefficient 
quantization? 


Solution 


Cascade or parallel form implementations! The numerator and denominator 
polynomials can be factored off-line at very high precision and grouped into 
second-order sections, which are then quantized section by section. The 
sensitivity of the quantization is thus that of second-order, rather than N-th 
order, polynomials. This yields major improvements in the frequency 
response of the overall filter, and is almost always done in practice. 


Note that the numerator polynomial faces the same sensitivity issues; the 
cascade form also improves the sensitivity of the zeros, because they are 
also factored into second-order terms. However, in the parallel form, the 
zeros are globally distributed across the sections, so they suffer from 
quantization of all the blocks. Thus the cascade form preserves zero 
locations much better than the parallel form, which typically means that the 
stopband behavior is better in the cascade form, so it is most often used in 
practice. 


Note: On the basis of the preceding analysis, it would seem important to 
use cascade structures in FIR filter implementations. However, most FIR 


filters are linear-phase and thus symmetric or anti-symmetric. As long as 
the quantization is implemented such that the filter coefficients retain 
symmetry, the filter retains linear phase. Furthermore, since all zeros off 
the unit circle must appear in groups of four for symmetric linear-phase 
filters, zero pairs can leave the unit circle only by joining with another pair. 
This requires relatively severe quantizations (enough to completely remove 
or change the sign of a ripple in the amplitude response). This "reluctance" 
of pole pairs to leave the unit circle tends to keep quantization from 
damaging the frequency response as much as might be expected, enough so 
that cascade structures are rarely used for FIR filters. 


Exercise: 


Problem: What is the worst-case pole pair in an IIR digital filter? 


Solution: 


The pole pair closest to the real axis in the z-plane, since the complex- 
conjugate poles will be closest together and thus have the highest 
sensitivity to quantization. 


Quantized Pole Locations 
In a direct-form or transpose-form implementation of a second-order 
section, the filter coefficients are quantized versions of the polynomial 
coefficients. 
D(z) = 2? +a1z+ a2 = (z—p)(z—p) 
= —ay ac Va? _ 4a» 
7 2 
p=Te 


D(z) = z? — 2rcos(0) +r? 


So 


Thus the quantization of a; and ag to B bits restricts the radius r to 


r = \/kAg, and a, = — (2%(p)) = kAg The following figure shows all 
stable pole locations after four-bit two's-complement quantization. 


Note the nonuniform distribution of possible pole locations. This might be 
good for poles near r = 1, 0 = 4, but not so good for poles near the origin 


or the Nyquist frequency. 


In the "normal-form" structures, a state-variable based realization, the poles 
are uniformly spaced. 


x X x xX X X xX 
Ke xX xX X X X X X X X X ‘| 
06 x xX X X xX X X X X X KX X X J 

x x x xX xX KX xX xX xX K KX xX xX 
0.4 xX KX xX X xX X X X X X X X K X X 7 
ei x X xX X xX X X X X X X X X X X 
xX K X X X X K X X X X X X X X 
0 x xX xX X X X K X X X X X X X X 4 
x X xX X xX X X X X X X X X X xX 
ae ORR K Ixe XKE K KR MR KR BR ‘| 
-0.4 x X x X X X K X X X X X X X X | 
x xX xX xX X K X X X X X X xX 

-0.6 x xX X X X X X X X K X X X 7 

oe x xX K xX KX rg KX KX XK KX 
l x xX X X X X xX 
=å i ii E am = 


This can only be accomplished if the coefficients to be quantized equal the 
real and imaginary parts of the pole location; that is, 


a, = rcos(@) = R(r) 
a2 = rsin(@) = 3(p) 


This is the case for a 2nd-order system with the state matrix 


Qi Q2 : ee 
A= ( ) : The denominator polynomial is 
Ay Qı 


Equation: 


det (z1 — A) = (z-a) +a? 
= 27 — 2az + a1? + az 
= 2z? — 2r cos(0)z + r? (cos? (0) + sin? (0)) 


= 2z? —2rcos(f)z +r? 


Given any second-order filter coefficient set, we can write it as a state-space 


system, find a transformation matrix T such that A = T7tAT is in normal 
form, and then implement the second-order section using a structure 
corresponding to the state equations. 


The normal form has a number of other advantages; both eigenvalues are 
equal, so it minimizes the norm of Az, which makes overflow less likely, 
and it minimizes the output variance due to quantization of the state values. 
It is sometimes used when minimization of finite-precision effects is 
critical. 

Exercise: 


Problem: What is the disadvantage of the normal form? 


Solution: 


It requires more computation. The general state-variable equation 
requires nine multiplies, rather than the five used by the Direct-Form H 
or Transpose-Form structures. 


Limit Cycles 


Large-scale limit cycles 


When overflow occurs, even otherwise stable filters may get stuck in a 
large-scale limit cycle, which is a short-period, almost full-scale persistent 
filter output caused by overflow. 


Example: 
Consider the second-order system 


l—zl442° 


y(n) 


with zero input and initial state values zọ[0] = 0.8, z1ı[0] = —0.8. Note 
yin] = zo[n + 1]. 
The filter is obviously stable, since the magnitude of the poles is 


57 = 0.707, which is well inside the unit circle. However, with 


wraparound overflow, note that y[0] = z[1] =  — 5 (-¢) =%=-—< 
, and that z,[2] = y[1] = (oe) — -2 = = = = SO 


ue =e es —3, s, ... even with zero input. 


Clearly, such behavior is intolerable and must be prevented. Saturation 
arithmetic has been proved to prevent zero-input limit cycles, which is one 
reason why all DSP microprocessors support this feature. In many 
applications, this is considered sufficient protection. Scaling to prevent 
overflow is another solution, if as well the inital state values are never 
initialized to limit-cycle-producing values. The normal-form structure also 
reduces the chance of overflow. 


Small-scale limit cycles 


Small-scale limit cycles are caused by quantization. Consider the system 


Note that when a2 > zo — An, rounding will quantize the output to the 
current level (with zero input), so the output will remain at this level 
forever. Note that the maximum amplitude of this "small-scale limit cycle" 
is achieved when 


Ap Ap 


A% = % — — =>= 2Zmax = —— 
a G x 2x(1=a) 


In a higher-order system, the small-scale limit cycles are oscillatory in 
nature. Any quantization scheme that never increases the magnitude of any 
quantized value prevents small-scale limit cycles. 


Note:Two's-complement truncation does not do this; it increases the 
magnitude of negative numbers. 


However, this introduces greater error and bias. Since the level of the limit 
cycles is proportional to Ap, they can be reduced by increasing the number 
of bits. Poles close to the unit circle increase the magnitude and likelihood 


of small-scale limit cycles. 


Scaling 


Overflow is clearly a serious problem, since the errors it introduces are very 
large. As we shall see, it is also responsible for large-scale limit cycles, which 
cannot be tolerated. One way to prevent overflow, or to render it acceptably 
unlikely, is to scale the input to a filter such that overflow cannot (or is 
sufficiently unlikely to) occur. 


In a fixed-point system, the range of the input signal is limited by the 
fractional fixed-point number representation to |æ[n]| < 1. If we scale the 
input by multiplying it by a value 6,0 < 8 < 1, then |8z[n]] < £. 


Another option is to incorporate the scaling directly into the filter 
coefficients. 


x(n) —»| BH, y(n) 


FIR Filter Scaling 


What value of @ is required so that the output of an FIR filter cannot overflow 
(Vn : (ly(m)| < 1), Vn: (Ja(n)| < 1))? 


Iu(n)| = | So alobeli — k) < Y hI |6| [a(n — &)| < 8Y alk) 
4 
B< 3 h(k) 


Alternatively, we can incorporate the scaling directly into the filter, and 
require that 


to prevent overflow. 


IIR Filter Scaling 


To prevent the output from overflowing in an IIR filter, the condition above 
still holds: (M = oo) 


so an initial scaling factor 6 < can be used, or the filter itself can 


as C 
X raolh(k)| 
be scaled. 


However, it is also necessary to prevent the states from overflowing, and to 
prevent overflow at any point in the signal flow graph where the arithmetic 
hardware would thereby produce errors. To prevent the states from 
overflowing, we determine the transfer function from the input to all states 2, 
and scale the filter such that Vi : (X` zo |h:(k)| < 1) 


Although this method of scaling guarantees no overflows, it is often too 
conservative. Note that a worst-case signal is x(n) = sign (h(—n)); this 
input may be extremely unlikely. In the relatively common situation in which 
the input is expected to be mainly a single-frequency sinusoid of unknown 
frequency and amplitude less than 1, a scaling condition of 


Vw : (|H(w)| < 1) 


is sufficient to guarantee no overflow. This scaling condition is often used. If 
there are several potential overflow locations t in the digital filter structure, 
the scaling conditions are 


viwe (Ca) | <1) 


where H;(w) is the frequency response from the input to location 7 in the 
filter. 


Even this condition may be excessively conservative, for example if the input 
is more-or-less random, or if occasional overflow can be tolerated. In 
practice, experimentation and simulation are often the best ways to optimize 
the scaling factors in a given application. 


For filters implemented in the cascade form, rather than scaling for the entire 
filter at the beginning, (which introduces lots of quantization of the input) the 
filter is usually scaled so that each stage is just prevented from overflowing. 
This is best in terms of reducing the quantization noise. The scaling factors 
are incorporated either into the previous or the next stage, whichever is most 
convenient. 


Some heurisitc rules for grouping poles and zeros in a cascade 
implementation are: 


1. Order the poles in terms of decreasing radius. Take the pole pair closest 
to the unit circle and group it with the zero pair closest to that pole pair 
(to minimize the gain in that section). Keep doing this with all remaining 
poles and zeros. 

2. Order the section with those with highest gain (argmax |H;(w)|) in the 
middle, and those with lower gain on the ends. 


Leland B. Jackson has an excellent intuitive discussion of finite-precision 
problems in digital filters. The book by Roberts and Mullis is one of the most 
thorough in terms of detail. 


